Commit Graph

4422 Commits

Author SHA1 Message Date
qazal 6dbe5585b0
batchnorm + conv backward in test_schedule (#4420)
* test both optims

* batchnorm_backward
2024-05-06 16:40:17 +03:00
Timmy 3f3c973022
Multiple Reduce Kernels - kernel properly orders reduceops (#4418)
* enable kernel with multiple reduceops

* copy self.reduceops

* assert only one reduceop per kernel

* kernel.py dfs order

* linters

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-05-06 13:54:44 +03:00
wozeparrot 603d3a351b
feat: allow keeping multiple cookies (#4440) 2024-05-05 19:26:48 -07:00
chenyu afe020710d
disable PADTO on upcasted axis (#4444)
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam 709410071c
mlperf/resnet: updated BEAM params to increase performance (#4443) 2024-05-05 21:49:46 -04:00
Francis Lam c8595a9655
update sops.gz, fix tests and add new linearizer test (#4437)
* update sops.gz, fix tests and add new linearizer test

* remove METAL CI skip for test_failure_22

* re-add skip to METAL CI to test_failure_22
2024-05-05 17:31:25 -04:00
wozeparrot 9ad3d0520a
hotfix: npy is also ok (#4439) 2024-05-05 13:48:54 -07:00
chenyu d0eb1540d5
helpers.diskcache_clear (#4436)
drop all tables in diskcache. added a unit test but disabled it by default because it will drop all cache...
2024-05-05 14:19:01 -04:00
George Hotz 595a6e3069 test_fold_conv_relu_backward test 2024-05-05 11:13:43 -07:00
George Hotz cc16f644d0 hotfix: remove FAKE buffer from graph 2024-05-05 10:52:41 -07:00
qazal 760776c59d
merge EfficientNet to C with clang job (#4426)
* merge ImageNet to C with linters

* add to clang

* delete from linter
2024-05-05 20:33:12 +03:00
chenyu 3b30756cbb
update mlperf submission system (#4435)
more required fields.
2024-05-05 13:19:07 -04:00
George Hotz f95658bc3e hotfix: pickle jit works if you delete the function 2024-05-05 10:14:03 -07:00
George Hotz 12be536c06
Clang graph (#4424)
* clang graph runner

* render_dtype

* name it ClangGraph

* JIT=2

* JIT=2 goes there

* JIT as context var
2024-05-05 09:54:12 -07:00
David Hou 544431c388
refactor: pass reduceop into global_load (#4417)
* pass reduceop directly to global_load

* typing

* make mypy happy :/

* cede a line to mypy :(

* fold in acc_const

* add todo
2024-05-05 19:43:48 +03:00
geohotstan 874dfc556c
update setitem tests to test for currently supported cases (#4334)
* tests, tests, tests

* one more test

* tests tests tests tests

* t e s t

* a few more
2024-05-05 11:59:13 -04:00
chenyu fc9e58e482
Revert "refactor sparse_categorical_crossentropy (#4406)" (#4429)
This reverts commit c7368515d2.
2024-05-05 02:30:37 -04:00
David Hou c0a048c044
batchnorm d(var)/d(mean) = 0 (#4430)
* d(var)/d(mean) = 0

* drop the number in test_schedule!
2024-05-05 00:25:45 -04:00
George Hotz e2eab9c2b3 hotfix: disk is okay in child process 2024-05-04 18:18:31 +00:00
George Hotz cf33afa778
don't open devices from children (#4425)
* don't open devices from children

* correct way to do this

* fix Device.DEFAULT and add back JITBEAM
2024-05-04 10:35:40 -07:00
qazal fa17dcaf07
Fix llm.c/export.py (#4423)
* fix headers

* add CI

* add stdio

* merge clang tests

* revert llm.c

* revert ci

* Revert "revert llm.c"

This reverts commit 5fd17e3c8b38dc9549d0548e9515185b7b032573.
2024-05-04 19:37:10 +03:00
George Hotz cb7289f9c9
remove clang program header (#4422)
* remove clang program header

* proper max

* bools are numbers

* fix compile enet
2024-05-04 08:38:01 -07:00
qazal 267bbb57f9
Revert "Add `insert_before` to Linearizer Functions (#4320)" (#4421)
This reverts commit 664b563c91.
2024-05-04 17:50:21 +03:00
qazal 5f3bae378f
search children in fusion (#4322)
* scheduler diff

* tests diff

* new changes

* realizes

* chores

* assign

* kind of r3

* forced_realize wont do it

* with forced_realize

* start with children

* test search

* r3 with parents

* diff cleanup

* add children

* crossing assign

* late fuse descendants

* update kernel counts

* assign diff doesnt belong here
2024-05-04 17:22:15 +03:00
qazal 249cadd106
fusing crossing diamond assign (#4403)
* refactor scheduler parents search

* assign target

* unit test

* can't chase this
2024-05-04 15:19:48 +03:00
George Hotz 9fc4465557
subbuffer support (#4397)
* subbuffer support

* diskbuffer offset

* cuda subbuffer works

* use subbuffer

* more subbuffer tests

* consecutive

* cast

* consec

* offset

* view is a better name

* offset is in nbytes

* fix view + memory planner

* delete unused DiskRunner

* reverse order

* no subbuffers on unrealized consts

* only enabled for disk

* don't reverse memory

* view supported devices

* pickle buffer view

* ring jit

* support extra view inputs in jit

* fix JIT=2 issue

* test copy jit

* p2p isn't an option anymore

* fix dep tracking issue

* fix mypy

* fix pickle

* from_nv is contents now
2024-05-03 18:05:57 -07:00
chenyu c7368515d2
refactor sparse_categorical_crossentropy (#4406)
factor out the -1 * and / loss_mask.sum() for both smoothing and non-smoothing terms
2024-05-03 14:28:36 -04:00
qazal 3401734e54
infra for scheduler process replay (#4405)
* use getenv

* capture ast

* fix graph

* replay schedules

* exec
2024-05-03 20:29:13 +03:00
chenyu 473ecb978a
remove SPLIT_REDUCEOP=1 from resnet scripts (#4404)
SPLIT_REDUCEOP=1 is default
2024-05-03 12:36:23 -04:00
David Hou b767d59684
resnet trainer: keep old cookie around until next step has been queued (#4401)
* keep old cookie around until next step has been queued (-10ms 6gpu)

* also for eval

* drop cookie before data_get?

* Revert "drop cookie before data_get?"

This reverts commit b01e6aa2b27f49aeab04b448f09e0ef9e689ea53.

* Revert "Revert "drop cookie before data_get?""

This reverts commit 23464e73d445007c15537c69818fdee89adf0740.
2024-05-03 12:15:21 -04:00
qazal cf3ccb809f
refactor scheduler parents search (#4402) 2024-05-03 17:16:34 +03:00
George-the-1st 0627e26140
Added missing unittest execution code (#4400)
same code as on every other test file, just missing from this one for some reason.
2024-05-02 22:34:30 -04:00
chenyu d4062cb6fc
NV tensor_cores in kernel.py (#4399) 2024-05-02 22:33:08 -04:00
qazal 0deaaf2bc8
partial fusion spec (#4398) 2024-05-03 04:14:23 +03:00
chenyu 2c3b7f8e70
pad resnet training data with training data mean (#4369)
update model_train resnet to pad training
2024-05-02 20:26:15 -04:00
Francis Lam 3cf8291f2f
mlperf/resnet: update beam params to increase time and quality (#4396)
* mlperf/resnet: update beam params to increase time and quality

* revert upcast 8 in search space and add rocm setup function

* refactor to independent setup.sh script
2024-05-02 20:14:46 -04:00
nimlgen ca6c8ae739
factor out resource access logic in multigraph base class (#4385)
* factor out resource access logic in multigraph base class

* hsa fixes

* clean

* linter

* linter 2

* not need this
2024-05-03 00:38:22 +03:00
chenyu ab01a9433d
resnet eval 4n+3 if epoch < 33 (#4391)
the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes
2024-05-02 16:52:07 -04:00
Francis Lam 7c8401fc65
search: skip timing the unoptimized kernel (#4395)
* search: skip timing the unoptimized kernel

also ensure the return the unoptimized kernel if no opts are valid
and refactor debugging to a single BEAM_DEBUG variable

* stop early on fast kernels that can't improve enough
2024-05-02 16:48:49 -04:00
Francis Lam 5c5b40880f
search: fix edge cases on screening potential ops (#4394)
* search: fix edge cases on screening potential ops

won't change correctness, but will save a little python time by
properly deduplicating potential actions

* check for de-duplication instead of exact valid actions

* refactor long line
2024-05-02 14:53:05 -04:00
George Hotz 89030b238a add consecutive property to shapetracker 2024-05-02 10:41:28 -07:00
George Hotz 2786dff26d
new disk tensor tests (#4393) 2024-05-02 08:54:44 -07:00
chenyu 7492e5d3e7
resnet correct log name for red (#4390) 2024-05-02 10:58:55 -04:00
chenyu bf31837e6d
resnet correct steps_in_val_epoch in logging (#4389)
also added random seed from system in scripts
2024-05-02 10:51:36 -04:00
George Hotz c8a2047377
testing for all reduce (#4387) 2024-05-02 06:34:10 -07:00
ym555 3113785604
Llama 3 Models (#4339)
* Full Impl

* fix test

* Fix inference loop

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-05-02 06:06:07 -07:00
qazal 0b47818e0f
simpler reduceop children chasing (#4350)
* simplest case

* midreduce case

* all tests

* pending things

* unify tests
2024-05-02 15:15:30 +03:00
chenyu 22376e53b7
resnet mlperf logging (#4361)
* resnet mlperf logging

* cropping too much?
2024-05-02 00:00:04 -04:00
George Hotz f635c4d273
fix define global (#4383)
* fix define global

* remove name from DEFINE_GLOBAL

* fix fuzzing

* fix ptx

* fix python
2024-05-01 22:32:56 -04:00
chenyu ad116dc5c6
fill in mlperf system description (#4381)
it did not ask too many details. will put software versions later with tinygrad commit.

```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0
INFO -   System description checker passed for tinybox red
```

```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4.
0.0
INFO -   System description checker passed for tinybox green
```
2024-05-01 16:47:45 -04:00