tinygrad

Commit Graph

Author	SHA1	Message	Date
Francis Lata	3644077a42	[MLPerf][UNet3D] Add DICE loss + metrics (#4204 ) * add DICE loss and metrics * update dice to include reference implementation's link * remove unused imports * remove unnecessary test file and update pred + label for metrics and losses test * add tests to CI + add exclusion of mlperf_unet3d --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-17 20:09:33 -04:00
chenyu	cd801a15f3	scipy.signal.gaussian -> scipy.signal.windows.gaussian (#4205 ) fixed unet3d model_eval, will add to CI after merging new dice loss	2024-04-17 19:15:37 -04:00
Elias Wahl	6eef8ee22a	Wikipedia download script for MLPerf BERT training (#4202 ) * wikipedia download script * add link * checksum valueError * ops	2024-04-17 16:34:57 -04:00
qazal	f75020a903	minimal diff for multioutput reduce pairs (#4030 ) * simple fusion * compiler cache patch * Revert "compiler cache patch" This reverts commit fa180495974456a1748a64865c4d329eae0a55e9. * Revert "Revert "compiler cache patch"" This reverts commit 57f8d41f985ac8acfff997136024b0b43577f195. * delete that * early sort * teeny renames * spec * .empty is great * delete sort * Update test_schedule.py * this is one kernel now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 10:55:44 -04:00
George Hotz	8564e28a1b	new memory scheduler with explicit refcounts (#4198 ) * new memory scheduler with explict refcounts * move central memory planner * typo + use central memory planner in openpilot * cleanups * include lb_refcount in pickle * replace PlaceHolder with memory planner * cleaner	2024-04-17 08:46:47 +04:00
Francis Lam	c91b7b1739	test: add fuzz_matmul and better debugging for simple_matmul (#4199 ) also show unoptimized shape in verify_kernel	2024-04-16 23:40:31 -04:00
qazal	ba8602612b	Fuzz all permutations of schedule (#4136 ) * simple toposort * fuzzer * init in_degree * move to tests * same seed * configure paths * internal graph * compare LazyBuffers * simpler * simple graph * assign works * simpler * fix JIT * upstream ci * move ci * fix the path * DEBUG=1 * limit max paths * launch a cmp kernel * Revert "launch a cmp kernel" This reverts commit 791c6089922fa7d800456f28fc167842f188ac7e. * exec ground truth * better perf * copy ground truth once * gpu allclose ast try1 * Revert "gpu allclose ast try1" This reverts commit 1f82103af3a7bfedb9f858b6c58b0b94f1c7e6b0. * prerealized bufs freezing * teeny cleanups * reuse Buffers * Revert "reuse Buffers" This reverts commit a71de94b035bd5ceb1ec257f6b2529b166bcd30b. --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 05:03:21 +04:00
nimlgen	4ed6b42a8a	fix kernargs check in kfd (#4194 )	2024-04-17 00:44:50 +03:00
David Hou	97d846dd67	in forced_realize, unchase last op if it is upcast (#4185 ) * in forced_realize, unchase last op if it is upcast * start on test * flesh out test * more test * comment * comment out parallel reduce test * reorder * unused	2024-04-16 17:15:17 -04:00
Francis Lam	e9c1616b27	logging: change LOGKERN to LOGKERNS to match LOGOPS (#4193 ) also add printing of ast and applied_opts during verify_kernel to more easily debug errors if they come up	2024-04-16 16:08:32 -04:00
David Hou	7fb220a567	touchup resnet_layer_bench (#4191 )	2024-04-16 14:43:00 -04:00
David Hou	1dbf3b2b19	Benchmarks for individual resnet layers (#4182 ) * resnet individual layer benchmarks! * small * 1 and 2 * mem_used * no ci * better conv print * defaults * prints * adjust * adjust * adjust * benchmark only one layer example * tensor.training, zero_grad, sum instead of mean, last mem, last kernel count * default jitcnt=1 * scale flops/kernels with jitcnt * add note about jitcnt memory * touchup	2024-04-16 13:53:18 -04:00
George Hotz	d49d4324a3	update docs (#4189 )	2024-04-16 16:07:02 +04:00
George Hotz	55ae73e951	Replicate llm.c in tinygrad (#4179 ) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * test tolist * simple fix for onnx test failures (#4186) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * bump line count to 7500 * simplest fix * safenumpy tolist for now --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> --------- Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>	2024-04-16 15:40:48 +04:00
George Hotz	b6e7243bfa	hotfix: skip slow pre-commit test	2024-04-16 11:48:43 +04:00
George Hotz	cda0010020	hotfix: docs-legacy	2024-04-16 11:06:56 +04:00
George Hotz	8f749ae0eb	New docs are in mkdocs (#4178 ) * start mkdocs * simple docs for tensor * more docs * move those back * more docs * copy markdown extensions * docs legacy * docs building workflow * fix showcase links * only that? * install tinygrad * add docs to setup.py * Delete examples/llm.c/data	2024-04-16 10:59:51 +04:00
chenyu	aa093efa43	fix handcode_resnet50_opt flops count (#4184 )	2024-04-15 22:13:45 -04:00
chenyu	d5b67c1ca3	log resnet TRAIN_BEAM / EVAL_BEAM (#4181 ) also run eval in benchmark mode if either one is positive	2024-04-15 19:29:08 -04:00
Francis Lam	9d2273235c	search: BEAM_UOPS_MAX to prune candidates with too many uops (#4088 ) * search: add better default settings for fast search not the highest possible performance, but adequate for most usage * search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes also sneak in a link to .gitignore for the unet3d dataset * revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition	2024-04-15 18:56:22 -04:00
qazal	286ea697f3	keep order in realizes (#4180 )	2024-04-16 01:25:50 +04:00
George Hotz	e14a9bca0c	hotfix: bump line count to 7500 for NV backend	2024-04-15 23:18:46 +04:00
chenyu	6a2168e698	TRAIN_BEAM and EVAL_BEAM for resnet (#4177 ) working on measuring compile time	2024-04-15 14:57:21 -04:00
Timmy	4592fc8fe7	Multireduce Kernels - prereq refactor (#4173 ) * refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops) * linters * addressing concerns	2024-04-14 20:16:54 -04:00
David Hou	593c90d7d6	Resnet fp16 training with fp32 master weight copy (#4144 ) * add casts to layers * FLOAT flag * detach * no_grad for eval * whitespace * explicit fp32 initialization * oops * whitespace * put back config['DEFAULT_FLOAT'] * bad * live dangerously (don't hide bugs) * don't bundle changes --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-14 11:25:08 -04:00
chenyu	e20d6f9221	correct resnet estimate time (#4169 ) 7.99 hours was rendered as 7h0m.	2024-04-14 02:21:46 -04:00
George Hotz	ea18d28253	some overview docs	2024-04-13 17:01:09 -07:00
George Hotz	50e780a588	multitensor shouldn't recompile (#4164 ) * multitensor shouldn't recompile * type annotations * fix tests * outcount in reduce	2024-04-13 00:03:48 -07:00
George Hotz	599eb266b1	optionally use a copy kernel instead of SDMA (#4116 ) * optionally use a copy kernel * lazyops in copied kernels * add sync * no sdma at all * work * copy_ast	2024-04-12 23:10:41 -07:00
George Hotz	ba7314c26b	cleanup lbs (#4163 )	2024-04-12 22:32:16 -07:00
chenyu	a7c6864260	remove CAST_BEFORE_VIEW (#4152 ) * remove CAST_BEFORE_VIEW testing perf, also this might have issue with assign? * remove all	2024-04-13 01:05:08 -04:00
George Hotz	ebc94c9d6c	rewrite the jit in the context of new schedule (#4162 ) * rewrite the jit in the context of new schedule * mypy better * fix placeholder * tests * all functionality should work * fix tests * no CacheCollector	2024-04-12 21:54:36 -07:00
George Hotz	b67f759780	abstractions3 is currently wishful thinking (#4124 ) * abstractions3 is currently wishful thinking * a3 * work * minor * progress on a3 * more * update abstractions3 * cleaner	2024-04-12 16:46:01 -07:00
MaximilianEmel	27a98aaecc	Rewritten SVG Logos (#4150 ) * rewrote the svg logos to use polygons and render better * changed self-closing tags' style to better conform to the original	2024-04-12 14:09:57 -07:00
chenyu	63eb0a68af	fix return dtype of gather (#4159 )	2024-04-12 16:25:12 -04:00
chenyu	d9c5a2b1bb	fix return dtype of getitem Tensor indexing (#4158 ) the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype	2024-04-12 15:55:02 -04:00
chenyu	f6c8032e5d	assert if expr_idxs return might be outside of int32 (#4157 )	2024-04-12 14:18:35 -04:00
nimlgen	24a27a01a9	hotfix: CUDA_P2P works (#4155 )	2024-04-12 18:20:12 +03:00
nimlgen	5a57b48134	cuda p2p enable when available (#4153 )	2024-04-12 16:21:54 +03:00
chenyu	380f27d629	move sum acc_dtype into lazy so it applies to backward (#4149 ) * move sum acc_dtype into lazy so it applies to backward * unit test	2024-04-11 14:43:56 -04:00
George Hotz	bbda20c0db	CompiledASTRunner -> CompiledRunner (#4148 )	2024-04-11 08:49:52 -07:00
George Hotz	0f16709c00	hotfix: remove test speed vs torch	2024-04-11 08:37:57 -07:00
qazal	c0796374e4	refactor membufs (#4147 )	2024-04-11 08:30:44 -07:00
George Hotz	b7e281cf10	JitItem -> ExecItem (#4146 ) * JitItem -> ExecItem * execitem in realize * cleaner * JITRunner -> Runner	2024-04-11 08:24:57 -07:00
George Hotz	e79a11b99c	hotfix: revert llama change	2024-04-10 20:13:15 -07:00
George Hotz	2e6c39b0b2	Do less realizes (#4141 ) * less realize * corealize jit inputs * prints * print before we run	2024-04-10 19:50:50 -07:00
chenyu	06bcae13b4	PADTO SUM if parents of sum are all zero-preserving (#4140 ) * PADTO SUM if parents of sum are all zero-preserving * test case unsafe ops after sum is fine * reuse UNSAFE_PAD_OPS * update db version	2024-04-10 22:16:12 -04:00
George Hotz	081dd1573f	hotfix: keep CUDA D2D copy behind the CUDA_P2P flag	2024-04-10 21:36:48 +00:00
George Hotz	af5984df43	cudagraph memcpy through host (#4137 )	2024-04-10 13:17:17 -07:00
terafo	5e6d2155e4	Add driving monitoring model to benchmarks (#4134 ) * add driving monitoring model to benchmarks * handle crash	2024-04-10 14:27:03 -04:00

1 2 3 4 5 ...

4151 Commits All Branches Search

4151 Commits

All Branches