tinygrad

Commit Graph

Author	SHA1	Message	Date
chenyu	5ef8d682e3	clean up attentions in stable diffusion (#2275 )	2023-11-11 14:25:36 -05:00
chenyu	453f48ce02	pad None means (0,0) (#2273 )	2023-11-11 09:50:26 -08:00
jxdv	c5d70c1871	typo (#2271 )	2023-11-11 07:18:04 -08:00
chenyu	880e693207	fix llama n_kv_heads in kvcache (#2267 ) * fix llama n_kv_heads in kvcache * trigger ci	2023-11-10 21:44:39 -05:00
George Hotz	78623ba204	two simple tests	2023-11-10 16:16:06 -08:00
George Hotz	70fb8a259d	hotfix mypy	2023-11-10 15:43:30 -08:00
George Hotz	6ceea02e65	hotfix of onnx	2023-11-10 15:40:30 -08:00
geohotstan	b853e9bb8c	Onnx 1.15.0 gogogo (#2217 ) * lol * lol * add GELULULULUL * onnx 1.50 * fuk torch bool neg * exclude regex tests * exclude dequantizelinear for now * is sunny in philly * damn it affinegrid * fixed auto_pad VALID * skip 0 shape tests * add temporary cast in Reduces * tests should pass now * added comments and cleanup * try moving dequantizelinear to onnx.py * fixed dequantizedlinear? * cleanup * try? * float16 segfaults LLVM CI..??? * cleanup comments * pin to 1.50.0 * remove use of -np.inf cuz numpy is kill * 1.50? lol I'm actually retarded * thx for review, muhbad * moved Gelu higher up	2023-11-10 15:36:48 -08:00
George Hotz	85d26ddc36	uops loop removal (#2262 ) * remove the loop * cleanups * tests failing still * global_loop_ctx wasn't needed * replace_op is cleaner * minor opt * cast opt was wrong * uop_num * uop num was dumb * tuplize_uops * torch tests * fix test_uops	2023-11-10 15:24:47 -08:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
qazal	b6aaf12df7	Internal cast 2 with more tests (#2257 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51. * Revert "render phi as the dtype" This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711. * reenable triton tests * no vstore_half if dtype is already half * upcast max	2023-11-10 10:42:39 -08:00
George Hotz	c0f447d6f7	Inline barrier (#2255 ) * put barrier inline for locals * fix pre-commit on m3 * gate if through barrier	2023-11-10 08:17:10 -08:00
chenyu	75f6e9ab54	one more fuzz linearizer failed example (#2260 )	2023-11-10 09:17:37 -05:00
George Hotz	330484c072	Revert "Internal casting support (#2046 )" (#2256 ) This reverts commit `7e1d08b2ae`.	2023-11-09 21:27:13 -08:00
qazal	7e1d08b2ae	Internal casting support (#2046 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51. * Revert "render phi as the dtype" This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711. * reenable triton tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-09 21:02:32 -08:00
vish-pr	6051f0ce82	For cuda get current free space from device, and retry alloc failures (#2197 ) * For cuda get current free space from device, and rery alloc failures * type ignore for mypy * add init to get free mem in cuda * Move retry logic in common lib. Fix typo in override _get_cur_free_space * linter error fix in test file * Not catch all, as it will catch KeyboardInterrupt * fix unintened line changes	2023-11-09 15:53:50 -08:00
qazal	2465d5d267	fix ops tests in test_dtype (#2237 ) * fix test ops * decompose the err from test_ops * skipTest skips the entire test, we dont want that * handle cases with the same priority * add int16 to torch map	2023-11-09 15:17:43 -08:00
George Hotz	80bf0b8586	proper wmma (#2245 ) * proper wmma * hip cast * bugfixes * bugfix * that bug is fixed --------- Co-authored-by: George Hotz <george@tinygrad.org>	2023-11-09 15:15:18 -08:00
wozeparrot	b7a31fb708	remove tokei badge from readme (#2251 )	2023-11-09 13:53:31 -05:00
2-5	50bf0703aa	fix sqlite cache path on Windows (#2250 )	2023-11-09 10:32:34 -08:00
chenyu	10d642e174	fuzz linearizer transformation (#2188 ) * fuzz linearizer transformation * no standard normal for fp16 * work * Interpreted start * CPU and TORCH work * fix MemBuffer with same idx * id for failed kernels * no image and variable for Interpreted * symbolic shape * IMAGE only for GPU * Interpreted almost all good * cleanup * fix bufs_from_lin * zero size * some failed examples * just Exception * just test not pass	2023-11-09 08:03:27 -08:00
chenyu	794122781d	Merge pull request #2242 from chenyuxyz/mypy-casts mypy check warn_redundant_casts	2023-11-08 20:04:46 -05:00
George Hotz	38b7f5a7fd	less phi, proper phi (#2241 ) * less phi, proper phi * disable flaky whisper test	2023-11-08 16:13:43 -08:00
chenyu	b9fe133af8	mypy check warn_redundant_casts	2023-11-08 15:06:55 -08:00
wozeparrot	4c44d1344b	feat: remove cache_id (#2236 )	2023-11-08 08:09:21 -08:00
Rory Clear	553688f12a	update metal matmul and matvec for compile api (#2238 )	2023-11-08 08:08:35 -08:00
George Hotz	3042450b4d	diskcache touchups (#2235 )	2023-11-07 18:00:04 -08:00
George Hotz	09bdd55acc	update debug prints	2023-11-07 17:47:25 -08:00
George Hotz	c0a033f01d	remove real_offset (#2234 ) * remove real_offset * pass in numnode * remove that real_offset * sample only for variable	2023-11-07 17:30:53 -08:00
George Hotz	4d95e6d070	move cache out of tmp (#2232 )	2023-11-07 11:41:00 -08:00
George Hotz	a48ccdb359	cleanup deps, no pyyaml, pillow to testing (#2231 )	2023-11-07 10:32:23 -08:00
nimlgen	ae5d1407ee	Fix mmaped in jit (#2225 ) * fix reuse for mmaped buffers in jit * comment	2023-11-06 14:54:21 -08:00
George Hotz	0c9b4ab885	no to_underlying (#2222 ) * no to_underlying * context is no longer used * no more optimizing * update docs	2023-11-05 21:34:20 -08:00
George Hotz	fbe7f0c62b	metal: unwrap lib write	2023-11-05 21:02:31 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
George Hotz	c60c3b467a	clean up symlinking in benchmark (#2219 ) * clean up symlinking * make torch deterministic	2023-11-05 16:46:05 -08:00
George Hotz	baeb77a403	Make the JIT simple (no batch exec, no cache collector) (#2215 ) * remove batch exec * simple cachecollector * remove cache collector test * less lr	2023-11-05 16:23:43 -08:00
chenyu	719a97b337	fix IMAGE=2 failed with NOOPT=1 (#2209 ) * IMAGE=2 failed with NOOPT=1 * fix it	2023-11-05 13:16:37 -08:00
chenyu	680cbfdba4	less broken limit_dims_to_max (#2214 )	2023-11-04 08:38:06 -07:00
Ahmed Harmouche	265304e7fd	Stable diffusion WebGPU port (#1370 ) * WIP: Stable diffusion WebGPU port * Load whole model: split safetensor to avoid Chrome allocation limit * Gitignore .DS_Store, remove debug print * Clip tokenizer in JS * WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS * e2e stable diffusion flow * Create initial random latent tensor in JS * SD working e2e * Log if some weights were not loaded properly * Remove latent_tensor.npy used for debugging * Cleanup, remove useless logs * Improve UI * Add progress bar * Remove .npy files used for debugging * Add clip tokenizer as external dependency * Remove alphas_cumprod.js and load it from safetensors * Refactor * Simplify a lot * Dedup base when limiting elementwise merge (webgpu) * Add return type to safe_load_metadata * Do not allow run when webgpu is not supported * Add progress bar, refactor, fix special names * Add option to chose from local vs huggingface weights * lowercase tinygrad :) * fp16 model dl, decompression client side * Cache f16 model in browser, better progress * Cache miss recovery --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-03 18:29:16 -07:00
chenyu	f582ec56d5	Replace (getenv("CI", "") != "") with helpers.CI (#2213 )	2023-11-03 15:20:44 -07:00
George Hotz	f17bc16f46	simple runtime args (#2211 ) * simple runtime args * fix some tests * fix abstractions and triton * fix search	2023-11-03 12:31:29 -07:00
George Hotz	9ea0448103	compile interpreted to python code (#2208 ) * sort of works * interpreted * fix flopcounter * interpreted * simpler * type * functools compile ast * lose a line * delete extra file * no self.method_cache	2023-11-03 09:16:12 -07:00
George Hotz	ddbc6eecaf	some refactors in the realization (#2206 ) * some refactors * delete old kernel search	2023-11-02 19:51:28 -07:00
George Hotz	51fd993f1f	pin onnx to 1.14.1	2023-11-02 18:03:21 -07:00
George Hotz	6621d2eb98	Revert "Modernize setup.py (#2187 )" This reverts commit `7e8c5f1a0f`.	2023-11-03 01:01:15 +00:00
nimlgen	6e06adcb95	fix hip segfault (#2204 )	2023-11-02 08:40:56 -07:00
George Hotz	03cf0afa4f	move all to compile api (#2203 ) * move metal+clang to compile api * all to the new style * remove binary arg * fix triton * fixup tests * fix clang * diskcache is generic * __wrapped__ * compile_gpu * fix thneed * keep the src in the ASTRunner * lib * move compile_gpu * compile_gpu in device * put compiler in astrunner * test reverts * triton compiler * ugh, that too	2023-11-01 23:01:32 -07:00
George Hotz	8932816816	remove arm64, caching for cuda (#2201 ) * remove arm64, caching for cuda * caching in llvm * switch cache_compiled to new cache * fix clang * caching for metal * fix pylint * cleanups * perf_counter and binary	2023-11-01 18:44:00 -07:00
George Hotz	7103b716c4	merge kernel and optimizer (#2200 ) * merge kernel and optimizer * linearize is reentrant * move global/local size * clean up linearizer copy * remove unneeded lin copies * stop linearizing twice * oops, that should be None	2023-11-01 15:20:01 -07:00

1 2 3 4 5 ...

2760 Commits All Branches Search

2760 Commits

All Branches