Francis Lata
3644077a42
[MLPerf][UNet3D] Add DICE loss + metrics ( #4204 )
...
* add DICE loss and metrics
* update dice to include reference implementation's link
* remove unused imports
* remove unnecessary test file and update pred + label for metrics and losses test
* add tests to CI + add exclusion of mlperf_unet3d
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-17 20:09:33 -04:00
chenyu
cd801a15f3
scipy.signal.gaussian -> scipy.signal.windows.gaussian ( #4205 )
...
fixed unet3d model_eval, will add to CI after merging new dice loss
2024-04-17 19:15:37 -04:00
Elias Wahl
6eef8ee22a
Wikipedia download script for MLPerf BERT training ( #4202 )
...
* wikipedia download script
* add link
* checksum valueError
* ops
2024-04-17 16:34:57 -04:00
qazal
f75020a903
minimal diff for multioutput reduce pairs ( #4030 )
...
* simple fusion
* compiler cache patch
* Revert "compiler cache patch"
This reverts commit fa180495974456a1748a64865c4d329eae0a55e9.
* Revert "Revert "compiler cache patch""
This reverts commit 57f8d41f985ac8acfff997136024b0b43577f195.
* delete that
* early sort
* teeny renames
* spec
* .empty is great
* delete sort
* Update test_schedule.py
* this is one kernel now
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-17 10:55:44 -04:00
George Hotz
8564e28a1b
new memory scheduler with explicit refcounts ( #4198 )
...
* new memory scheduler with explict refcounts
* move central memory planner
* typo + use central memory planner in openpilot
* cleanups
* include lb_refcount in pickle
* replace PlaceHolder with memory planner
* cleaner
2024-04-17 08:46:47 +04:00
Francis Lam
c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul ( #4199 )
...
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b
Fuzz all permutations of schedule ( #4136 )
...
* simple toposort
* fuzzer
* init in_degree
* move to tests
* same seed
* configure paths
* internal graph
* compare LazyBuffers
* simpler
* simple graph
* assign works
* simpler
* fix JIT
* upstream ci
* move ci
* fix the path
* DEBUG=1
* limit max paths
* launch a cmp kernel
* Revert "launch a cmp kernel"
This reverts commit 791c6089922fa7d800456f28fc167842f188ac7e.
* exec ground truth
* better perf
* copy ground truth once
* gpu allclose ast try1
* Revert "gpu allclose ast try1"
This reverts commit 1f82103af3a7bfedb9f858b6c58b0b94f1c7e6b0.
* prerealized bufs freezing
* teeny cleanups
* reuse Buffers
* Revert "reuse Buffers"
This reverts commit a71de94b035bd5ceb1ec257f6b2529b166bcd30b.
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-17 05:03:21 +04:00
nimlgen
4ed6b42a8a
fix kernargs check in kfd ( #4194 )
2024-04-17 00:44:50 +03:00
David Hou
97d846dd67
in forced_realize, unchase last op if it is upcast ( #4185 )
...
* in forced_realize, unchase last op if it is upcast
* start on test
* flesh out test
* more test
* comment
* comment out parallel reduce test
* reorder
* unused
2024-04-16 17:15:17 -04:00
Francis Lam
e9c1616b27
logging: change LOGKERN to LOGKERNS to match LOGOPS ( #4193 )
...
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567
touchup resnet_layer_bench ( #4191 )
2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19
Benchmarks for individual resnet layers ( #4182 )
...
* resnet individual layer benchmarks!
* small
* 1 and 2
* mem_used
* no ci
* better conv print
* defaults
* prints
* adjust
* adjust
* adjust
* benchmark only one layer example
* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count
* default jitcnt=1
* scale flops/kernels with jitcnt
* add note about jitcnt memory
* touchup
2024-04-16 13:53:18 -04:00
George Hotz
d49d4324a3
update docs ( #4189 )
2024-04-16 16:07:02 +04:00
George Hotz
55ae73e951
Replicate llm.c in tinygrad ( #4179 )
...
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* test tolist
* simple fix for onnx test failures (#4186 )
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* bump line count to 7500
* simplest fix
* safenumpy tolist for now
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
---------
Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>
2024-04-16 15:40:48 +04:00
George Hotz
b6e7243bfa
hotfix: skip slow pre-commit test
2024-04-16 11:48:43 +04:00
George Hotz
cda0010020
hotfix: docs-legacy
2024-04-16 11:06:56 +04:00
George Hotz
8f749ae0eb
New docs are in mkdocs ( #4178 )
...
* start mkdocs
* simple docs for tensor
* more docs
* move those back
* more docs
* copy markdown extensions
* docs legacy
* docs building workflow
* fix showcase links
* only that?
* install tinygrad
* add docs to setup.py
* Delete examples/llm.c/data
2024-04-16 10:59:51 +04:00
chenyu
aa093efa43
fix handcode_resnet50_opt flops count ( #4184 )
2024-04-15 22:13:45 -04:00
chenyu
d5b67c1ca3
log resnet TRAIN_BEAM / EVAL_BEAM ( #4181 )
...
also run eval in benchmark mode if either one is positive
2024-04-15 19:29:08 -04:00
Francis Lam
9d2273235c
search: BEAM_UOPS_MAX to prune candidates with too many uops ( #4088 )
...
* search: add better default settings for fast search
not the highest possible performance, but adequate for most usage
* search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes
also sneak in a link to .gitignore for the unet3d dataset
* revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition
2024-04-15 18:56:22 -04:00
qazal
286ea697f3
keep order in realizes ( #4180 )
2024-04-16 01:25:50 +04:00
George Hotz
e14a9bca0c
hotfix: bump line count to 7500 for NV backend
2024-04-15 23:18:46 +04:00
chenyu
6a2168e698
TRAIN_BEAM and EVAL_BEAM for resnet ( #4177 )
...
working on measuring compile time
2024-04-15 14:57:21 -04:00
Timmy
4592fc8fe7
Multireduce Kernels - prereq refactor ( #4173 )
...
* refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops)
* linters
* addressing concerns
2024-04-14 20:16:54 -04:00
David Hou
593c90d7d6
Resnet fp16 training with fp32 master weight copy ( #4144 )
...
* add casts to layers
* FLOAT flag
* detach
* no_grad for eval
* whitespace
* explicit fp32 initialization
* oops
* whitespace
* put back config['DEFAULT_FLOAT']
* bad
* live dangerously (don't hide bugs)
* don't bundle changes
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-14 11:25:08 -04:00
chenyu
e20d6f9221
correct resnet estimate time ( #4169 )
...
7.99 hours was rendered as 7h0m.
2024-04-14 02:21:46 -04:00
George Hotz
ea18d28253
some overview docs
2024-04-13 17:01:09 -07:00
George Hotz
50e780a588
multitensor shouldn't recompile ( #4164 )
...
* multitensor shouldn't recompile
* type annotations
* fix tests
* outcount in reduce
2024-04-13 00:03:48 -07:00
George Hotz
599eb266b1
optionally use a copy kernel instead of SDMA ( #4116 )
...
* optionally use a copy kernel
* lazyops in copied kernels
* add sync
* no sdma at all
* work
* copy_ast
2024-04-12 23:10:41 -07:00
George Hotz
ba7314c26b
cleanup lbs ( #4163 )
2024-04-12 22:32:16 -07:00
chenyu
a7c6864260
remove CAST_BEFORE_VIEW ( #4152 )
...
* remove CAST_BEFORE_VIEW
testing perf, also this might have issue with assign?
* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c
rewrite the jit in the context of new schedule ( #4162 )
...
* rewrite the jit in the context of new schedule
* mypy better
* fix placeholder
* tests
* all functionality should work
* fix tests
* no CacheCollector
2024-04-12 21:54:36 -07:00
George Hotz
b67f759780
abstractions3 is currently wishful thinking ( #4124 )
...
* abstractions3 is currently wishful thinking
* a3
* work
* minor
* progress on a3
* more
* update abstractions3
* cleaner
2024-04-12 16:46:01 -07:00
MaximilianEmel
27a98aaecc
Rewritten SVG Logos ( #4150 )
...
* rewrote the svg logos to use polygons and render better
* changed self-closing tags' style to better conform to the original
2024-04-12 14:09:57 -07:00
chenyu
63eb0a68af
fix return dtype of gather ( #4159 )
2024-04-12 16:25:12 -04:00
chenyu
d9c5a2b1bb
fix return dtype of getitem Tensor indexing ( #4158 )
...
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu
f6c8032e5d
assert if expr_idxs return might be outside of int32 ( #4157 )
2024-04-12 14:18:35 -04:00
nimlgen
24a27a01a9
hotfix: CUDA_P2P works ( #4155 )
2024-04-12 18:20:12 +03:00
nimlgen
5a57b48134
cuda p2p enable when available ( #4153 )
2024-04-12 16:21:54 +03:00
chenyu
380f27d629
move sum acc_dtype into lazy so it applies to backward ( #4149 )
...
* move sum acc_dtype into lazy so it applies to backward
* unit test
2024-04-11 14:43:56 -04:00
George Hotz
bbda20c0db
CompiledASTRunner -> CompiledRunner ( #4148 )
2024-04-11 08:49:52 -07:00
George Hotz
0f16709c00
hotfix: remove test speed vs torch
2024-04-11 08:37:57 -07:00
qazal
c0796374e4
refactor membufs ( #4147 )
2024-04-11 08:30:44 -07:00
George Hotz
b7e281cf10
JitItem -> ExecItem ( #4146 )
...
* JitItem -> ExecItem
* execitem in realize
* cleaner
* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
George Hotz
e79a11b99c
hotfix: revert llama change
2024-04-10 20:13:15 -07:00
George Hotz
2e6c39b0b2
Do less realizes ( #4141 )
...
* less realize
* corealize jit inputs
* prints
* print before we run
2024-04-10 19:50:50 -07:00
chenyu
06bcae13b4
PADTO SUM if parents of sum are all zero-preserving ( #4140 )
...
* PADTO SUM if parents of sum are all zero-preserving
* test case unsafe ops after sum is fine
* reuse UNSAFE_PAD_OPS
* update db version
2024-04-10 22:16:12 -04:00
George Hotz
081dd1573f
hotfix: keep CUDA D2D copy behind the CUDA_P2P flag
2024-04-10 21:36:48 +00:00
George Hotz
af5984df43
cudagraph memcpy through host ( #4137 )
2024-04-10 13:17:17 -07:00
terafo
5e6d2155e4
Add driving monitoring model to benchmarks ( #4134 )
...
* add driving monitoring model to benchmarks
* handle crash
2024-04-10 14:27:03 -04:00