tinygrad

Commit Graph

Author	SHA1	Message	Date
wozeparrot	4c44d1344b	feat: remove cache_id (#2236 )	2023-11-08 08:09:21 -08:00
Rory Clear	553688f12a	update metal matmul and matvec for compile api (#2238 )	2023-11-08 08:08:35 -08:00
George Hotz	3042450b4d	diskcache touchups (#2235 )	2023-11-07 18:00:04 -08:00
George Hotz	09bdd55acc	update debug prints	2023-11-07 17:47:25 -08:00
George Hotz	c0a033f01d	remove real_offset (#2234 ) * remove real_offset * pass in numnode * remove that real_offset * sample only for variable	2023-11-07 17:30:53 -08:00
George Hotz	4d95e6d070	move cache out of tmp (#2232 )	2023-11-07 11:41:00 -08:00
George Hotz	a48ccdb359	cleanup deps, no pyyaml, pillow to testing (#2231 )	2023-11-07 10:32:23 -08:00
nimlgen	ae5d1407ee	Fix mmaped in jit (#2225 ) * fix reuse for mmaped buffers in jit * comment	2023-11-06 14:54:21 -08:00
George Hotz	0c9b4ab885	no to_underlying (#2222 ) * no to_underlying * context is no longer used * no more optimizing * update docs	2023-11-05 21:34:20 -08:00
George Hotz	fbe7f0c62b	metal: unwrap lib write	2023-11-05 21:02:31 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
George Hotz	c60c3b467a	clean up symlinking in benchmark (#2219 ) * clean up symlinking * make torch deterministic	2023-11-05 16:46:05 -08:00
George Hotz	baeb77a403	Make the JIT simple (no batch exec, no cache collector) (#2215 ) * remove batch exec * simple cachecollector * remove cache collector test * less lr	2023-11-05 16:23:43 -08:00
chenyu	719a97b337	fix IMAGE=2 failed with NOOPT=1 (#2209 ) * IMAGE=2 failed with NOOPT=1 * fix it	2023-11-05 13:16:37 -08:00
chenyu	680cbfdba4	less broken limit_dims_to_max (#2214 )	2023-11-04 08:38:06 -07:00
Ahmed Harmouche	265304e7fd	Stable diffusion WebGPU port (#1370 ) * WIP: Stable diffusion WebGPU port * Load whole model: split safetensor to avoid Chrome allocation limit * Gitignore .DS_Store, remove debug print * Clip tokenizer in JS * WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS * e2e stable diffusion flow * Create initial random latent tensor in JS * SD working e2e * Log if some weights were not loaded properly * Remove latent_tensor.npy used for debugging * Cleanup, remove useless logs * Improve UI * Add progress bar * Remove .npy files used for debugging * Add clip tokenizer as external dependency * Remove alphas_cumprod.js and load it from safetensors * Refactor * Simplify a lot * Dedup base when limiting elementwise merge (webgpu) * Add return type to safe_load_metadata * Do not allow run when webgpu is not supported * Add progress bar, refactor, fix special names * Add option to chose from local vs huggingface weights * lowercase tinygrad :) * fp16 model dl, decompression client side * Cache f16 model in browser, better progress * Cache miss recovery --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-03 18:29:16 -07:00
chenyu	f582ec56d5	Replace (getenv("CI", "") != "") with helpers.CI (#2213 )	2023-11-03 15:20:44 -07:00
George Hotz	f17bc16f46	simple runtime args (#2211 ) * simple runtime args * fix some tests * fix abstractions and triton * fix search	2023-11-03 12:31:29 -07:00
George Hotz	9ea0448103	compile interpreted to python code (#2208 ) * sort of works * interpreted * fix flopcounter * interpreted * simpler * type * functools compile ast * lose a line * delete extra file * no self.method_cache	2023-11-03 09:16:12 -07:00
George Hotz	ddbc6eecaf	some refactors in the realization (#2206 ) * some refactors * delete old kernel search	2023-11-02 19:51:28 -07:00
George Hotz	51fd993f1f	pin onnx to 1.14.1	2023-11-02 18:03:21 -07:00
George Hotz	6621d2eb98	Revert "Modernize setup.py (#2187 )" This reverts commit `7e8c5f1a0f`.	2023-11-03 01:01:15 +00:00
nimlgen	6e06adcb95	fix hip segfault (#2204 )	2023-11-02 08:40:56 -07:00
George Hotz	03cf0afa4f	move all to compile api (#2203 ) * move metal+clang to compile api * all to the new style * remove binary arg * fix triton * fixup tests * fix clang * diskcache is generic * __wrapped__ * compile_gpu * fix thneed * keep the src in the ASTRunner * lib * move compile_gpu * compile_gpu in device * put compiler in astrunner * test reverts * triton compiler * ugh, that too	2023-11-01 23:01:32 -07:00
George Hotz	8932816816	remove arm64, caching for cuda (#2201 ) * remove arm64, caching for cuda * caching in llvm * switch cache_compiled to new cache * fix clang * caching for metal * fix pylint * cleanups * perf_counter and binary	2023-11-01 18:44:00 -07:00
George Hotz	7103b716c4	merge kernel and optimizer (#2200 ) * merge kernel and optimizer * linearize is reentrant * move global/local size * clean up linearizer copy * remove unneeded lin copies * stop linearizing twice * oops, that should be None	2023-11-01 15:20:01 -07:00
George Hotz	33bb650e94	use mad in opencl (#2198 ) Co-authored-by: Comma Device <device@comma.ai>	2023-11-01 10:40:08 -07:00
George Hotz	c8b6a811ea	no locals as opt action (#2196 ) * switch barrier, add clear_l2 * no locals can be searched * revert barrier * fix ci * put it there	2023-11-01 09:47:44 -07:00
Comma Device	2e9982fe2d	fastvits example that's 10% faster	2023-10-31 21:48:23 -07:00
George Hotz	8ba7ced7f9	extract const if it's const (#2193 ) * extract const if it's const * fix if statement * fast math issue * fix graphing and casting * disable flaky copyout test	2023-10-31 18:52:35 -07:00
George Hotz	b245f1307e	add exp2 (#2192 )	2023-10-31 17:48:42 -07:00
qazal	e2428b63a6	external (#2191 )	2023-10-31 13:57:24 -07:00
Elias Wahl	7e8c5f1a0f	Modernize setup.py (#2187 ) * Added pyproject.toml * Pin onnx	2023-10-31 13:55:45 -07:00
nimlgen	8c07c73a9b	Fix cl map buffer (#2190 ) * fix gpu enqueue_map_buffer out of space * add test	2023-10-31 12:02:46 -07:00
George Hotz	c59ea32f90	prevent over unrolling in optimzer	2023-10-31 11:45:18 -07:00
George Hotz	5aaa8a0cc1	fix shape	2023-10-31 11:36:19 -07:00
George Hotz	a27c9f9de5	openpilot compile2 (#2189 ) * try compile2 * pass to thneed * fix tanh onnx	2023-10-31 11:08:58 -07:00
qazal	be5f185ac0	Higher test coverage for dtypes (#2156 ) * refactor unit tests for dtypes * add missing dtypes in llvmir.py and lib.py * skip torch tests * webgpu * cleaner skips * fix llvm bool casting issue using compare * llvm 100% passing * llvm segfault * TEMP decrease timeout mins to 11 debug * add bf16 to setup * skip half tests in cuda cpu * check for CUDACPU insetad * add int16 to triton dtypes * u16 for triton * remove debug - diff is still hard to read * derive from base class TestDType * enhance test_upcast and downcast by running on every possible version * dummy commit to rerun the flakey test * skip the correct tests for CUDA * bf16 should be skipped in the common TestDType cases * re-enable bf16 * more consistent structure * tiny changes to is_dtype_supported 1 * tiny changes 2 add reason * fuzz * fuzzer p2 * run fp32 twice * remove duplicate fp32 run * clang: use stdbool * skip triton on bool casts * merge and resolve conflicts	2023-10-30 22:38:42 -07:00
forcefieldsovereign	f294bdd681	fixed imports (#2185 )	2023-10-30 22:07:17 -07:00
Akshay Kashyap	018bd29e37	Enable Multi-Output Export (#2179 ) * Enable Multi-Output Export * Add test * Update examples and lint * fix padding * test ops * dummy commit to rerun test * revert cuda lint * Enforce tuple/list of tensors * subscripted generics * put back webgpu test * Re-enable WebGPU Efficientnet test	2023-10-30 18:42:26 -07:00
qazal	a7439af786	Fix llvm int->bool cast (#2164 ) * add to ir * add test case * minimize diff * todo * enable fast math * added both False and True case	2023-10-30 15:28:23 -07:00
George Hotz	94cf652b6b	don't use locals applies to GROUP also	2023-10-30 13:56:43 -07:00
George Hotz	5cc536bcc0	don't use locals applies to LASTLOCAL	2023-10-30 13:53:42 -07:00
chenyu	3c88af5071	use unique table name for each disk_cache test (#2184 )	2023-10-30 13:49:49 -07:00
George Hotz	608e3ee800	fix no locals search and search both (#2171 ) * fix no locals search and search both * pretty print * nolocals default no other search	2023-10-30 10:22:50 -07:00
George Hotz	194e4ad6f8	Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162 )" (#2182 ) This reverts commit `8cf0bb9351`.	2023-10-30 10:22:26 -07:00
Ahmed Harmouche	95f7183c3a	Reenable global, local limiting (#2095 )	2023-10-30 10:17:23 -07:00
chenyu	8548b20b23	fix codellama params and repeat_kv (#2181 )	2023-10-30 10:16:26 -07:00
George Hotz	c7f4dd6cb0	CACHELEVEL for smaller caches	2023-10-28 07:26:03 -10:00
chenyu	6c58bf3e9c	in time_linearizer, allocate a scratch buffer if output buffer is also input (#2152 ) * in time_linearizer, allocate a scratch buffer if output buffer is also input * move scratch buffer creation outside search	2023-10-28 07:17:41 -10:00

1 2 3 4 5 ...

2786 Commits All Branches Search

2786 Commits

All Branches