tinygrad/.gitignore

__pycache__
.venv/
.vscode
.DS_Store
notebooks
.*.swp
.*.swo
*.pyc
*.so
*.txt
build
/dist
*.egg-info
/env
a.out
boxes.jpg
pandecode.dump
vertex.bin
recognize*
.idea
disassemblers/applegpu
disassemblers/cuda_ioctl_sniffer
*.prof
extra/datasets/cifar-10-python.tar.gz
extra/datasets/librispeech/
extra/datasets/imagenet/
extra/datasets/kits19/
extra/datasets/squad/
extra/datasets/img_align_celeba*
extra/datasets/open-images-v6-mlperf
extra/datasets/kits/
extra/datasets/COCO/
extra/datasets/audio*
extra/weights
venv
examples/**/net.*[js,json]
examples/**/*.safetensors
node_modules
package.json
package-lock.json
temp
*.csv
.coverage
coverage.xml
htmlcov
outputs_yolov8
wandb
model.safetensors
quickstart.py
.hypothesis
start tinygrad 2020-10-18 13:57:01 +08:00			`__pycache__`
Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) (#758) * add stack() and repeat() methods * make stack a static method 2023-05-02 00:37:46 +08:00			`.venv/`
			`.vscode`
Stable diffusion WebGPU port (#1370) * WIP: Stable diffusion WebGPU port * Load whole model: split safetensor to avoid Chrome allocation limit * Gitignore .DS_Store, remove debug print * Clip tokenizer in JS * WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS * e2e stable diffusion flow * Create initial random latent tensor in JS * SD working e2e * Log if some weights were not loaded properly * Remove latent_tensor.npy used for debugging * Cleanup, remove useless logs * Improve UI * Add progress bar * Remove .npy files used for debugging * Add clip tokenizer as external dependency * Remove alphas_cumprod.js and load it from safetensors * Refactor * Simplify a lot * Dedup base when limiting elementwise merge (webgpu) * Add return type to safe_load_metadata * Do not allow run when webgpu is not supported * Add progress bar, refactor, fix special names * Add option to chose from local vs huggingface weights * lowercase tinygrad :) * fp16 model dl, decompression client side * Cache f16 model in browser, better progress * Cache miss recovery --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> 2023-11-04 09:29:16 +08:00			`.DS_Store`
start tinygrad 2020-10-18 13:57:01 +08:00			`notebooks`
			`.*.swp`
add pad2d on GPU 2020-11-08 02:46:36 +08:00			`.*.swo`
[WIP] YOLO working on tinygrad! (#245) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds… 2021-04-26 09:06:52 +08:00			`*.pyc`
Mypy fun (#541) * mypy fun * things are just faster * running fast * mypy is fast * compile.sh * no gpu hack * refactor ops_cpu and ops_torch to not subclass * make weak buffer work * tensor works * fix test failing * cpu/torch cleanups * no or operator on dict in python 3.8 * that was junk * fix warnings * comment and touchup 2023-02-08 23:56:51 +08:00			`*.so`
Reenable tensor cores for self-hosted Mac CI (#1717) * debug 5 matmul * allow tensor cores in CI * tensor cores on arm64 * put debug back 2023-08-30 22:53:04 +08:00			`*.txt`
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00			`build`
distributed world (#1481) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device 2023-08-11 01:00:51 +08:00			`/dist`
add setup.py and change imports to relative 2020-10-26 23:19:50 +08:00			`*.egg-info`
add pad2d on GPU 2020-11-08 02:46:36 +08:00			`/env`
Support for Apple Neural Engine (#130) * ane query is success * cite and build instructions * low level access, need to disable AMFI * coreml_ane works * coreml fun * more work * compiled example * progress * compiler works * model flow * TODOs in the readme * put some real weights in * we are learning objc * much progress i think * signed model still doesn't work * working example * there are float16 * clean up: part 1 * h11ane header, more cleanup * cleanup DeviceController creation * remove the stupid sleep * notes * start a hwx parser * no tabs * compare stuff * hmm, why don't inputs work * cache doesn't seem to fix it * hmm, the issue was the compiler * fix the compiler, guess i didn't put in weights * logging for compiler * uselessness in plist * remove hwx before compile, weights are converted to float16 * better compare * better compare * last line in comparE * opcodes from compiler * notes 2020-12-04 02:32:26 +08:00			`a.out`
use tinynn for Conv2d 2021-10-31 10:40:44 +08:00			`boxes.jpg`
metal timing, fix speed test 2023-02-18 04:31:05 +08:00			`pandecode.dump`
			`vertex.bin`
clang backend (#572) * start clang backend * mostly working * no group for reduce w clang * it compiles * compiles * a11y * minor fixups * formatting * add a test * rename test 2023-02-21 10:18:18 +08:00			`recognize*`
Add PyCharm's .idea to .gitignore (#597) 2023-02-25 12:14:38 +08:00			`.idea`
move applegpu disassembler 2023-03-06 03:21:12 +08:00			`disassemblers/applegpu`
multidevice works (#763) * basic multigpu working * better multigpu test * upper * touchups * cl sync 2023-05-04 16:04:58 +08:00			`disassemblers/cuda_ioctl_sniffer`
good changes from llama branch (#671) * good changes from llama * transpose behavior changed 2023-03-10 12:51:22 +08:00			`*.prof`
Fix naming conflict with huggingface datasets (#1161) * Rename in files * Move files * Moved to extra/datasets as suggested * Changes to files * Fixed stupid mistake --------- Co-authored-by: terafo <terafo@protonmail.com> 2023-07-08 01:43:44 +08:00			`extra/datasets/cifar-10-python.tar.gz`
			`extra/datasets/librispeech/`
			`extra/datasets/imagenet/`
			`extra/datasets/kits19/`
			`extra/datasets/squad/`
			`extra/datasets/img_align_celeba*`
			`extra/datasets/open-images-v6-mlperf`
			`extra/datasets/kits/`
			`extra/datasets/COCO/`
			`extra/datasets/audio*`
move to new cached fetch (#2493) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout 2023-11-29 09:36:55 +08:00			`extra/weights`
Onnx ops And, Or, Xor, Not (#847) * onnx and, or, xor, not * added bool type to llvm and clang * removed float conversion * switched where op to use tensor func 2023-05-30 02:09:20 +08:00			`venv`
Stable diffusion WebGPU port (#1370) * WIP: Stable diffusion WebGPU port * Load whole model: split safetensor to avoid Chrome allocation limit * Gitignore .DS_Store, remove debug print * Clip tokenizer in JS * WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS * e2e stable diffusion flow * Create initial random latent tensor in JS * SD working e2e * Log if some weights were not loaded properly * Remove latent_tensor.npy used for debugging * Cleanup, remove useless logs * Improve UI * Add progress bar * Remove .npy files used for debugging * Add clip tokenizer as external dependency * Remove alphas_cumprod.js and load it from safetensors * Refactor * Simplify a lot * Dedup base when limiting elementwise merge (webgpu) * Add return type to safe_load_metadata * Do not allow run when webgpu is not supported * Add progress bar, refactor, fix special names * Add option to chose from local vs huggingface weights * lowercase tinygrad :) * fp16 model dl, decompression client side * Cache f16 model in browser, better progress * Cache miss recovery --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> 2023-11-04 09:29:16 +08:00			`examples/*/net.[js,json]`
			`examples/*/.safetensors`
Webgpu support (#1077) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> 2023-07-13 03:52:06 +08:00			`node_modules`
			`package.json`
torch test touchup 2023-07-20 00:37:23 +08:00			`package-lock.json`
gitignore in weights 2023-08-03 00:26:41 +08:00			`temp`
do benchmarking (#1451) * do benchmarking * system * artifact * go * name artifact 2023-08-06 14:35:01 +08:00			`*.csv`
Shorter (#1582) * deleting lines * remove insert dims * if statement is never hit * bug fixes 2023-08-20 23:12:16 +08:00			`.coverage`
			`coverage.xml`
			`htmlcov`
op logger + replay (#2021) * logops * fix dtype printing * needs inf * ops dataset * minor improvements * 12k kernels * opt can compile * graph flops 2023-10-09 06:10:18 +08:00			`outputs_yolov8`
ResNet training changes (update benchmark) (#2390) * default arg for chunk * bring back to_ * good changes * new set * unused hash * fix optim * new torch loader * fix test lr scheduler 2023-11-23 09:41:12 +08:00			`wandb`
new style device (#2530) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device 2023-12-01 09:07:16 +08:00			`model.safetensors`
			`quickstart.py`
green dtypes ALU tests (#2617) * dtypes alu test * those types don't exist in torch * floats * more tests * disable those * a couple unary tests * skip float16 tests in CI for GPU * fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM * remove hardcoded float for LLVM ALU fns * less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32. * return on overflows * fix CUDA exp2 * compute results of op regardless of bounds in a python backend * skip fp16 in GPU and CUDACPU * fuzz a smaller range in the float_midcast_int32 test I sampled this and we overflow ~70% of the time. because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here * remove CUDA exp2 overload it's already there now --------- Co-authored-by: George Hotz <geohot@gmail.com> 2023-12-07 00:15:46 +08:00			`.hypothesis`