Commit Graph

4151 Commits

Author SHA1 Message Date
Francis Lata 3644077a42
[MLPerf][UNet3D] Add DICE loss + metrics (#4204)
* add DICE loss and metrics

* update dice to include reference implementation's link

* remove unused imports

* remove unnecessary test file and update pred + label for metrics and losses test

* add tests to CI + add exclusion of mlperf_unet3d

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-17 20:09:33 -04:00
chenyu cd801a15f3
scipy.signal.gaussian -> scipy.signal.windows.gaussian (#4205)
fixed unet3d model_eval, will add to CI after merging new dice loss
2024-04-17 19:15:37 -04:00
Elias Wahl 6eef8ee22a
Wikipedia download script for MLPerf BERT training (#4202)
* wikipedia download script

* add link

* checksum valueError

* ops
2024-04-17 16:34:57 -04:00
qazal f75020a903
minimal diff for multioutput reduce pairs (#4030)
* simple fusion

* compiler cache patch

* Revert "compiler cache patch"

This reverts commit fa180495974456a1748a64865c4d329eae0a55e9.

* Revert "Revert "compiler cache patch""

This reverts commit 57f8d41f985ac8acfff997136024b0b43577f195.

* delete that

* early sort

* teeny renames

* spec

* .empty is great

* delete sort

* Update test_schedule.py

* this is one kernel now

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-17 10:55:44 -04:00
George Hotz 8564e28a1b
new memory scheduler with explicit refcounts (#4198)
* new memory scheduler with explict refcounts

* move central memory planner

* typo + use central memory planner in openpilot

* cleanups

* include lb_refcount in pickle

* replace PlaceHolder with memory planner

* cleaner
2024-04-17 08:46:47 +04:00
Francis Lam c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul (#4199)
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal ba8602612b
Fuzz all permutations of schedule (#4136)
* simple toposort

* fuzzer

* init in_degree

* move to tests

* same seed

* configure paths

* internal graph

* compare LazyBuffers

* simpler

* simple graph

* assign works

* simpler

* fix JIT

* upstream ci

* move ci

* fix the path

* DEBUG=1

* limit max paths

* launch a cmp kernel

* Revert "launch a cmp kernel"

This reverts commit 791c6089922fa7d800456f28fc167842f188ac7e.

* exec ground truth

* better perf

* copy ground truth once

* gpu allclose ast try1

* Revert "gpu allclose ast try1"

This reverts commit 1f82103af3a7bfedb9f858b6c58b0b94f1c7e6b0.

* prerealized bufs freezing

* teeny cleanups

* reuse Buffers

* Revert "reuse Buffers"

This reverts commit a71de94b035bd5ceb1ec257f6b2529b166bcd30b.

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-17 05:03:21 +04:00
nimlgen 4ed6b42a8a
fix kernargs check in kfd (#4194) 2024-04-17 00:44:50 +03:00
David Hou 97d846dd67
in forced_realize, unchase last op if it is upcast (#4185)
* in forced_realize, unchase last op if it is upcast

* start on test

* flesh out test

* more test

* comment

* comment out parallel reduce test

* reorder

* unused
2024-04-16 17:15:17 -04:00
Francis Lam e9c1616b27
logging: change LOGKERN to LOGKERNS to match LOGOPS (#4193)
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou 7fb220a567
touchup resnet_layer_bench (#4191) 2024-04-16 14:43:00 -04:00
David Hou 1dbf3b2b19
Benchmarks for individual resnet layers (#4182)
* resnet individual layer benchmarks!

* small

* 1 and 2

* mem_used

* no ci

* better conv print

* defaults

* prints

* adjust

* adjust

* adjust

* benchmark only one layer example

* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count

* default jitcnt=1

* scale flops/kernels with jitcnt

* add note about jitcnt memory

* touchup
2024-04-16 13:53:18 -04:00
George Hotz d49d4324a3
update docs (#4189) 2024-04-16 16:07:02 +04:00
George Hotz 55ae73e951
Replicate llm.c in tinygrad (#4179)
* write llm.c and add a few new methods to tensor

* training works

* add jit

* tests for new functions

* test tolist

* simple fix for onnx test failures (#4186)

* write llm.c and add a few new methods to tensor

* training works

* add jit

* tests for new functions

* bump line count to 7500

* simplest fix

* safenumpy tolist for now

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>

---------

Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>
2024-04-16 15:40:48 +04:00
George Hotz b6e7243bfa hotfix: skip slow pre-commit test 2024-04-16 11:48:43 +04:00
George Hotz cda0010020 hotfix: docs-legacy 2024-04-16 11:06:56 +04:00
George Hotz 8f749ae0eb
New docs are in mkdocs (#4178)
* start mkdocs

* simple docs for tensor

* more docs

* move those back

* more docs

* copy markdown extensions

* docs legacy

* docs building workflow

* fix showcase links

* only that?

* install tinygrad

* add docs to setup.py

* Delete examples/llm.c/data
2024-04-16 10:59:51 +04:00
chenyu aa093efa43
fix handcode_resnet50_opt flops count (#4184) 2024-04-15 22:13:45 -04:00
chenyu d5b67c1ca3
log resnet TRAIN_BEAM / EVAL_BEAM (#4181)
also run eval in benchmark mode if either one is positive
2024-04-15 19:29:08 -04:00
Francis Lam 9d2273235c
search: BEAM_UOPS_MAX to prune candidates with too many uops (#4088)
* search: add better default settings for fast search

not the highest possible performance, but adequate for most usage

* search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes

also sneak in a link to .gitignore for the unet3d dataset

* revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition
2024-04-15 18:56:22 -04:00
qazal 286ea697f3
keep order in realizes (#4180) 2024-04-16 01:25:50 +04:00
George Hotz e14a9bca0c hotfix: bump line count to 7500 for NV backend 2024-04-15 23:18:46 +04:00
chenyu 6a2168e698
TRAIN_BEAM and EVAL_BEAM for resnet (#4177)
working on measuring compile time
2024-04-15 14:57:21 -04:00
Timmy 4592fc8fe7
Multireduce Kernels - prereq refactor (#4173)
* refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops)

* linters

* addressing concerns
2024-04-14 20:16:54 -04:00
David Hou 593c90d7d6
Resnet fp16 training with fp32 master weight copy (#4144)
* add casts to layers

* FLOAT flag

* detach

* no_grad for eval

* whitespace

* explicit fp32 initialization

* oops

* whitespace

* put back config['DEFAULT_FLOAT']

* bad

* live dangerously (don't hide bugs)

* don't bundle changes

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-14 11:25:08 -04:00
chenyu e20d6f9221
correct resnet estimate time (#4169)
7.99 hours was rendered as 7h0m.
2024-04-14 02:21:46 -04:00
George Hotz ea18d28253 some overview docs 2024-04-13 17:01:09 -07:00
George Hotz 50e780a588
multitensor shouldn't recompile (#4164)
* multitensor shouldn't recompile

* type annotations

* fix tests

* outcount in reduce
2024-04-13 00:03:48 -07:00
George Hotz 599eb266b1
optionally use a copy kernel instead of SDMA (#4116)
* optionally use a copy kernel

* lazyops in copied kernels

* add sync

* no sdma at all

* work

* copy_ast
2024-04-12 23:10:41 -07:00
George Hotz ba7314c26b
cleanup lbs (#4163) 2024-04-12 22:32:16 -07:00
chenyu a7c6864260
remove CAST_BEFORE_VIEW (#4152)
* remove CAST_BEFORE_VIEW

testing perf, also this might have issue with assign?

* remove all
2024-04-13 01:05:08 -04:00
George Hotz ebc94c9d6c
rewrite the jit in the context of new schedule (#4162)
* rewrite the jit in the context of new schedule

* mypy better

* fix placeholder

* tests

* all functionality should work

* fix tests

* no CacheCollector
2024-04-12 21:54:36 -07:00
George Hotz b67f759780
abstractions3 is currently wishful thinking (#4124)
* abstractions3 is currently wishful thinking

* a3

* work

* minor

* progress on a3

* more

* update abstractions3

* cleaner
2024-04-12 16:46:01 -07:00
MaximilianEmel 27a98aaecc
Rewritten SVG Logos (#4150)
* rewrote the svg logos to use polygons and render better

* changed self-closing tags' style to better conform to the original
2024-04-12 14:09:57 -07:00
chenyu 63eb0a68af
fix return dtype of gather (#4159) 2024-04-12 16:25:12 -04:00
chenyu d9c5a2b1bb
fix return dtype of getitem Tensor indexing (#4158)
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu f6c8032e5d
assert if expr_idxs return might be outside of int32 (#4157) 2024-04-12 14:18:35 -04:00
nimlgen 24a27a01a9
hotfix: CUDA_P2P works (#4155) 2024-04-12 18:20:12 +03:00
nimlgen 5a57b48134
cuda p2p enable when available (#4153) 2024-04-12 16:21:54 +03:00
chenyu 380f27d629
move sum acc_dtype into lazy so it applies to backward (#4149)
* move sum acc_dtype into lazy so it applies to backward

* unit test
2024-04-11 14:43:56 -04:00
George Hotz bbda20c0db
CompiledASTRunner -> CompiledRunner (#4148) 2024-04-11 08:49:52 -07:00
George Hotz 0f16709c00 hotfix: remove test speed vs torch 2024-04-11 08:37:57 -07:00
qazal c0796374e4
refactor membufs (#4147) 2024-04-11 08:30:44 -07:00
George Hotz b7e281cf10
JitItem -> ExecItem (#4146)
* JitItem -> ExecItem

* execitem in realize

* cleaner

* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
George Hotz e79a11b99c hotfix: revert llama change 2024-04-10 20:13:15 -07:00
George Hotz 2e6c39b0b2
Do less realizes (#4141)
* less realize

* corealize jit inputs

* prints

* print before we run
2024-04-10 19:50:50 -07:00
chenyu 06bcae13b4
PADTO SUM if parents of sum are all zero-preserving (#4140)
* PADTO SUM if parents of sum are all zero-preserving

* test case unsafe ops after sum is fine

* reuse UNSAFE_PAD_OPS

* update db version
2024-04-10 22:16:12 -04:00
George Hotz 081dd1573f hotfix: keep CUDA D2D copy behind the CUDA_P2P flag 2024-04-10 21:36:48 +00:00
George Hotz af5984df43
cudagraph memcpy through host (#4137) 2024-04-10 13:17:17 -07:00
terafo 5e6d2155e4
Add driving monitoring model to benchmarks (#4134)
* add driving monitoring model to benchmarks

* handle crash
2024-04-10 14:27:03 -04:00