Commit Graph

2786 Commits

Author SHA1 Message Date
wozeparrot 4c44d1344b
feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00
Rory Clear 553688f12a
update metal matmul and matvec for compile api (#2238) 2023-11-08 08:08:35 -08:00
George Hotz 3042450b4d
diskcache touchups (#2235) 2023-11-07 18:00:04 -08:00
George Hotz 09bdd55acc update debug prints 2023-11-07 17:47:25 -08:00
George Hotz c0a033f01d
remove real_offset (#2234)
* remove real_offset

* pass in numnode

* remove that real_offset

* sample only for variable
2023-11-07 17:30:53 -08:00
George Hotz 4d95e6d070
move cache out of tmp (#2232) 2023-11-07 11:41:00 -08:00
George Hotz a48ccdb359
cleanup deps, no pyyaml, pillow to testing (#2231) 2023-11-07 10:32:23 -08:00
nimlgen ae5d1407ee
Fix mmaped in jit (#2225)
* fix reuse for mmaped buffers in jit

* comment
2023-11-06 14:54:21 -08:00
George Hotz 0c9b4ab885
no to_underlying (#2222)
* no to_underlying

* context is no longer used

* no more optimizing

* update docs
2023-11-05 21:34:20 -08:00
George Hotz fbe7f0c62b metal: unwrap lib write 2023-11-05 21:02:31 -08:00
George Hotz 2f7aab3d13
move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz c60c3b467a
clean up symlinking in benchmark (#2219)
* clean up symlinking

* make torch deterministic
2023-11-05 16:46:05 -08:00
George Hotz baeb77a403
Make the JIT simple (no batch exec, no cache collector) (#2215)
* remove batch exec

* simple cachecollector

* remove cache collector test

* less lr
2023-11-05 16:23:43 -08:00
chenyu 719a97b337
fix IMAGE=2 failed with NOOPT=1 (#2209)
* IMAGE=2 failed with NOOPT=1

* fix it
2023-11-05 13:16:37 -08:00
chenyu 680cbfdba4
less broken limit_dims_to_max (#2214) 2023-11-04 08:38:06 -07:00
Ahmed Harmouche 265304e7fd
Stable diffusion WebGPU port (#1370)
* WIP: Stable diffusion WebGPU port

* Load whole model: split safetensor to avoid Chrome allocation limit

* Gitignore .DS_Store, remove debug print

* Clip tokenizer in JS

* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS

* e2e stable diffusion flow

* Create initial random latent tensor in JS

* SD working e2e

* Log if some weights were not loaded properly

* Remove latent_tensor.npy used for debugging

* Cleanup, remove useless logs

* Improve UI

* Add progress bar

* Remove .npy files used for debugging

* Add clip tokenizer as external dependency

* Remove alphas_cumprod.js and load it from safetensors

* Refactor

* Simplify a lot

* Dedup base when limiting elementwise merge (webgpu)

* Add return type to safe_load_metadata

* Do not allow run when webgpu is not supported

* Add progress bar, refactor, fix special names

* Add option to chose from local vs huggingface weights

* lowercase tinygrad :)

* fp16 model dl, decompression client side

* Cache f16 model in browser, better progress

* Cache miss recovery

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-03 18:29:16 -07:00
chenyu f582ec56d5
Replace (getenv("CI", "") != "") with helpers.CI (#2213) 2023-11-03 15:20:44 -07:00
George Hotz f17bc16f46
simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz 9ea0448103
compile interpreted to python code (#2208)
* sort of works

* interpreted

* fix flopcounter

* interpreted

* simpler

* type

* functools compile ast

* lose a line

* delete extra file

* no self.method_cache
2023-11-03 09:16:12 -07:00
George Hotz ddbc6eecaf
some refactors in the realization (#2206)
* some refactors

* delete old kernel search
2023-11-02 19:51:28 -07:00
George Hotz 51fd993f1f pin onnx to 1.14.1 2023-11-02 18:03:21 -07:00
George Hotz 6621d2eb98 Revert "Modernize setup.py (#2187)"
This reverts commit 7e8c5f1a0f.
2023-11-03 01:01:15 +00:00
nimlgen 6e06adcb95
fix hip segfault (#2204) 2023-11-02 08:40:56 -07:00
George Hotz 03cf0afa4f
move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
George Hotz 8932816816
remove arm64, caching for cuda (#2201)
* remove arm64, caching for cuda

* caching in llvm

* switch cache_compiled to new cache

* fix clang

* caching for metal

* fix pylint

* cleanups

* perf_counter and binary
2023-11-01 18:44:00 -07:00
George Hotz 7103b716c4
merge kernel and optimizer (#2200)
* merge kernel and optimizer

* linearize is reentrant

* move global/local size

* clean up linearizer copy

* remove unneeded lin copies

* stop linearizing twice

* oops, that should be None
2023-11-01 15:20:01 -07:00
George Hotz 33bb650e94
use mad in opencl (#2198)
Co-authored-by: Comma Device <device@comma.ai>
2023-11-01 10:40:08 -07:00
George Hotz c8b6a811ea
no locals as opt action (#2196)
* switch barrier, add clear_l2

* no locals can be searched

* revert barrier

* fix ci

* put it there
2023-11-01 09:47:44 -07:00
Comma Device 2e9982fe2d fastvits example that's 10% faster 2023-10-31 21:48:23 -07:00
George Hotz 8ba7ced7f9
extract const if it's const (#2193)
* extract const if it's const

* fix if statement

* fast math issue

* fix graphing and casting

* disable flaky copyout test
2023-10-31 18:52:35 -07:00
George Hotz b245f1307e
add exp2 (#2192) 2023-10-31 17:48:42 -07:00
qazal e2428b63a6
external (#2191) 2023-10-31 13:57:24 -07:00
Elias Wahl 7e8c5f1a0f
Modernize setup.py (#2187)
* Added pyproject.toml

* Pin onnx
2023-10-31 13:55:45 -07:00
nimlgen 8c07c73a9b
Fix cl map buffer (#2190)
* fix gpu enqueue_map_buffer out of space

* add test
2023-10-31 12:02:46 -07:00
George Hotz c59ea32f90 prevent over unrolling in optimzer 2023-10-31 11:45:18 -07:00
George Hotz 5aaa8a0cc1 fix shape 2023-10-31 11:36:19 -07:00
George Hotz a27c9f9de5
openpilot compile2 (#2189)
* try compile2

* pass to thneed

* fix tanh onnx
2023-10-31 11:08:58 -07:00
qazal be5f185ac0
Higher test coverage for dtypes (#2156)
* refactor unit tests for dtypes

* add missing dtypes in llvmir.py and lib.py

* skip torch tests

* webgpu

* cleaner skips

* fix llvm bool casting issue using compare

* llvm 100% passing

* llvm segfault

* TEMP decrease timeout mins to 11

debug

* add bf16 to setup

* skip half tests in cuda cpu

* check for CUDACPU insetad

* add int16 to triton dtypes

* u16 for triton

* remove debug - diff is still hard to read

* derive from base class TestDType

* enhance test_upcast and downcast by running on every possible version

* dummy commit to rerun the flakey test

* skip the correct tests for CUDA

* bf16 should be skipped in the common TestDType cases

* re-enable bf16

* more consistent structure

* tiny changes to is_dtype_supported 1

* tiny changes 2

add reason

* fuzz

* fuzzer p2

* run fp32 twice

* remove duplicate fp32 run

* clang: use stdbool

* skip triton on bool casts

* merge and resolve conflicts
2023-10-30 22:38:42 -07:00
forcefieldsovereign f294bdd681
fixed imports (#2185) 2023-10-30 22:07:17 -07:00
Akshay Kashyap 018bd29e37
Enable Multi-Output Export (#2179)
* Enable Multi-Output Export

* Add test

* Update examples and lint

* fix padding

* test ops

* dummy commit to rerun test

* revert cuda lint

* Enforce tuple/list of tensors

* subscripted generics

* put back webgpu test

* Re-enable WebGPU Efficientnet test
2023-10-30 18:42:26 -07:00
qazal a7439af786
Fix llvm int->bool cast (#2164)
* add to ir

* add test case

* minimize diff

* todo

* enable fast math

* added both False and True case
2023-10-30 15:28:23 -07:00
George Hotz 94cf652b6b don't use locals applies to GROUP also 2023-10-30 13:56:43 -07:00
George Hotz 5cc536bcc0 don't use locals applies to LASTLOCAL 2023-10-30 13:53:42 -07:00
chenyu 3c88af5071
use unique table name for each disk_cache test (#2184) 2023-10-30 13:49:49 -07:00
George Hotz 608e3ee800
fix no locals search and search both (#2171)
* fix no locals search and search both

* pretty print

* nolocals default no other search
2023-10-30 10:22:50 -07:00
George Hotz 194e4ad6f8
Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162)" (#2182)
This reverts commit 8cf0bb9351.
2023-10-30 10:22:26 -07:00
Ahmed Harmouche 95f7183c3a
Reenable global, local limiting (#2095) 2023-10-30 10:17:23 -07:00
chenyu 8548b20b23
fix codellama params and repeat_kv (#2181) 2023-10-30 10:16:26 -07:00
George Hotz c7f4dd6cb0 CACHELEVEL for smaller caches 2023-10-28 07:26:03 -10:00
chenyu 6c58bf3e9c
in time_linearizer, allocate a scratch buffer if output buffer is also input (#2152)
* in time_linearizer, allocate a scratch buffer if output buffer is also input

* move scratch buffer creation outside search
2023-10-28 07:17:41 -10:00