qazal
6dbe5585b0
batchnorm + conv backward in test_schedule ( #4420 )
...
* test both optims
* batchnorm_backward
2024-05-06 16:40:17 +03:00
Timmy
3f3c973022
Multiple Reduce Kernels - kernel properly orders reduceops ( #4418 )
...
* enable kernel with multiple reduceops
* copy self.reduceops
* assert only one reduceop per kernel
* kernel.py dfs order
* linters
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-05-06 13:54:44 +03:00
wozeparrot
603d3a351b
feat: allow keeping multiple cookies ( #4440 )
2024-05-05 19:26:48 -07:00
chenyu
afe020710d
disable PADTO on upcasted axis ( #4444 )
...
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam
709410071c
mlperf/resnet: updated BEAM params to increase performance ( #4443 )
2024-05-05 21:49:46 -04:00
Francis Lam
c8595a9655
update sops.gz, fix tests and add new linearizer test ( #4437 )
...
* update sops.gz, fix tests and add new linearizer test
* remove METAL CI skip for test_failure_22
* re-add skip to METAL CI to test_failure_22
2024-05-05 17:31:25 -04:00
wozeparrot
9ad3d0520a
hotfix: npy is also ok ( #4439 )
2024-05-05 13:48:54 -07:00
chenyu
d0eb1540d5
helpers.diskcache_clear ( #4436 )
...
drop all tables in diskcache. added a unit test but disabled it by default because it will drop all cache...
2024-05-05 14:19:01 -04:00
George Hotz
595a6e3069
test_fold_conv_relu_backward test
2024-05-05 11:13:43 -07:00
George Hotz
cc16f644d0
hotfix: remove FAKE buffer from graph
2024-05-05 10:52:41 -07:00
qazal
760776c59d
merge EfficientNet to C with clang job ( #4426 )
...
* merge ImageNet to C with linters
* add to clang
* delete from linter
2024-05-05 20:33:12 +03:00
chenyu
3b30756cbb
update mlperf submission system ( #4435 )
...
more required fields.
2024-05-05 13:19:07 -04:00
George Hotz
f95658bc3e
hotfix: pickle jit works if you delete the function
2024-05-05 10:14:03 -07:00
George Hotz
12be536c06
Clang graph ( #4424 )
...
* clang graph runner
* render_dtype
* name it ClangGraph
* JIT=2
* JIT=2 goes there
* JIT as context var
2024-05-05 09:54:12 -07:00
David Hou
544431c388
refactor: pass reduceop into global_load ( #4417 )
...
* pass reduceop directly to global_load
* typing
* make mypy happy :/
* cede a line to mypy :(
* fold in acc_const
* add todo
2024-05-05 19:43:48 +03:00
geohotstan
874dfc556c
update setitem tests to test for currently supported cases ( #4334 )
...
* tests, tests, tests
* one more test
* tests tests tests tests
* t e s t
* a few more
2024-05-05 11:59:13 -04:00
chenyu
fc9e58e482
Revert "refactor sparse_categorical_crossentropy ( #4406 )" ( #4429 )
...
This reverts commit c7368515d2
.
2024-05-05 02:30:37 -04:00
David Hou
c0a048c044
batchnorm d(var)/d(mean) = 0 ( #4430 )
...
* d(var)/d(mean) = 0
* drop the number in test_schedule!
2024-05-05 00:25:45 -04:00
George Hotz
e2eab9c2b3
hotfix: disk is okay in child process
2024-05-04 18:18:31 +00:00
George Hotz
cf33afa778
don't open devices from children ( #4425 )
...
* don't open devices from children
* correct way to do this
* fix Device.DEFAULT and add back JITBEAM
2024-05-04 10:35:40 -07:00
qazal
fa17dcaf07
Fix llm.c/export.py ( #4423 )
...
* fix headers
* add CI
* add stdio
* merge clang tests
* revert llm.c
* revert ci
* Revert "revert llm.c"
This reverts commit 5fd17e3c8b38dc9549d0548e9515185b7b032573.
2024-05-04 19:37:10 +03:00
George Hotz
cb7289f9c9
remove clang program header ( #4422 )
...
* remove clang program header
* proper max
* bools are numbers
* fix compile enet
2024-05-04 08:38:01 -07:00
qazal
267bbb57f9
Revert "Add `insert_before` to Linearizer Functions ( #4320 )" ( #4421 )
...
This reverts commit 664b563c91
.
2024-05-04 17:50:21 +03:00
qazal
5f3bae378f
search children in fusion ( #4322 )
...
* scheduler diff
* tests diff
* new changes
* realizes
* chores
* assign
* kind of r3
* forced_realize wont do it
* with forced_realize
* start with children
* test search
* r3 with parents
* diff cleanup
* add children
* crossing assign
* late fuse descendants
* update kernel counts
* assign diff doesnt belong here
2024-05-04 17:22:15 +03:00
qazal
249cadd106
fusing crossing diamond assign ( #4403 )
...
* refactor scheduler parents search
* assign target
* unit test
* can't chase this
2024-05-04 15:19:48 +03:00
George Hotz
9fc4465557
subbuffer support ( #4397 )
...
* subbuffer support
* diskbuffer offset
* cuda subbuffer works
* use subbuffer
* more subbuffer tests
* consecutive
* cast
* consec
* offset
* view is a better name
* offset is in nbytes
* fix view + memory planner
* delete unused DiskRunner
* reverse order
* no subbuffers on unrealized consts
* only enabled for disk
* don't reverse memory
* view supported devices
* pickle buffer view
* ring jit
* support extra view inputs in jit
* fix JIT=2 issue
* test copy jit
* p2p isn't an option anymore
* fix dep tracking issue
* fix mypy
* fix pickle
* from_nv is contents now
2024-05-03 18:05:57 -07:00
chenyu
c7368515d2
refactor sparse_categorical_crossentropy ( #4406 )
...
factor out the -1 * and / loss_mask.sum() for both smoothing and non-smoothing terms
2024-05-03 14:28:36 -04:00
qazal
3401734e54
infra for scheduler process replay ( #4405 )
...
* use getenv
* capture ast
* fix graph
* replay schedules
* exec
2024-05-03 20:29:13 +03:00
chenyu
473ecb978a
remove SPLIT_REDUCEOP=1 from resnet scripts ( #4404 )
...
SPLIT_REDUCEOP=1 is default
2024-05-03 12:36:23 -04:00
David Hou
b767d59684
resnet trainer: keep old cookie around until next step has been queued ( #4401 )
...
* keep old cookie around until next step has been queued (-10ms 6gpu)
* also for eval
* drop cookie before data_get?
* Revert "drop cookie before data_get?"
This reverts commit b01e6aa2b27f49aeab04b448f09e0ef9e689ea53.
* Revert "Revert "drop cookie before data_get?""
This reverts commit 23464e73d445007c15537c69818fdee89adf0740.
2024-05-03 12:15:21 -04:00
qazal
cf3ccb809f
refactor scheduler parents search ( #4402 )
2024-05-03 17:16:34 +03:00
George-the-1st
0627e26140
Added missing unittest execution code ( #4400 )
...
same code as on every other test file, just missing from this one for some reason.
2024-05-02 22:34:30 -04:00
chenyu
d4062cb6fc
NV tensor_cores in kernel.py ( #4399 )
2024-05-02 22:33:08 -04:00
qazal
0deaaf2bc8
partial fusion spec ( #4398 )
2024-05-03 04:14:23 +03:00
chenyu
2c3b7f8e70
pad resnet training data with training data mean ( #4369 )
...
update model_train resnet to pad training
2024-05-02 20:26:15 -04:00
Francis Lam
3cf8291f2f
mlperf/resnet: update beam params to increase time and quality ( #4396 )
...
* mlperf/resnet: update beam params to increase time and quality
* revert upcast 8 in search space and add rocm setup function
* refactor to independent setup.sh script
2024-05-02 20:14:46 -04:00
nimlgen
ca6c8ae739
factor out resource access logic in multigraph base class ( #4385 )
...
* factor out resource access logic in multigraph base class
* hsa fixes
* clean
* linter
* linter 2
* not need this
2024-05-03 00:38:22 +03:00
chenyu
ab01a9433d
resnet eval 4n+3 if epoch < 33 ( #4391 )
...
the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes
2024-05-02 16:52:07 -04:00
Francis Lam
7c8401fc65
search: skip timing the unoptimized kernel ( #4395 )
...
* search: skip timing the unoptimized kernel
also ensure the return the unoptimized kernel if no opts are valid
and refactor debugging to a single BEAM_DEBUG variable
* stop early on fast kernels that can't improve enough
2024-05-02 16:48:49 -04:00
Francis Lam
5c5b40880f
search: fix edge cases on screening potential ops ( #4394 )
...
* search: fix edge cases on screening potential ops
won't change correctness, but will save a little python time by
properly deduplicating potential actions
* check for de-duplication instead of exact valid actions
* refactor long line
2024-05-02 14:53:05 -04:00
George Hotz
89030b238a
add consecutive property to shapetracker
2024-05-02 10:41:28 -07:00
George Hotz
2786dff26d
new disk tensor tests ( #4393 )
2024-05-02 08:54:44 -07:00
chenyu
7492e5d3e7
resnet correct log name for red ( #4390 )
2024-05-02 10:58:55 -04:00
chenyu
bf31837e6d
resnet correct steps_in_val_epoch in logging ( #4389 )
...
also added random seed from system in scripts
2024-05-02 10:51:36 -04:00
George Hotz
c8a2047377
testing for all reduce ( #4387 )
2024-05-02 06:34:10 -07:00
ym555
3113785604
Llama 3 Models ( #4339 )
...
* Full Impl
* fix test
* Fix inference loop
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-05-02 06:06:07 -07:00
qazal
0b47818e0f
simpler reduceop children chasing ( #4350 )
...
* simplest case
* midreduce case
* all tests
* pending things
* unify tests
2024-05-02 15:15:30 +03:00
chenyu
22376e53b7
resnet mlperf logging ( #4361 )
...
* resnet mlperf logging
* cropping too much?
2024-05-02 00:00:04 -04:00
George Hotz
f635c4d273
fix define global ( #4383 )
...
* fix define global
* remove name from DEFINE_GLOBAL
* fix fuzzing
* fix ptx
* fix python
2024-05-01 22:32:56 -04:00
chenyu
ad116dc5c6
fill in mlperf system description ( #4381 )
...
it did not ask too many details. will put software versions later with tinygrad commit.
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0
INFO - System description checker passed for tinybox red
```
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4.
0.0
INFO - System description checker passed for tinybox green
```
2024-05-01 16:47:45 -04:00