Commit Graph

4837 Commits

Author SHA1 Message Date
George Hotz c98ca23cb9
test pickle variable (#5150)
* test pickle variable

* fix process replay
2024-06-25 19:49:21 -07:00
David Hou 8fcc41582f
make buffer view optional with a flag (#5120) 2024-06-25 19:13:20 -07:00
George Hotz 63ba2d05d1
uops dfs cleanup (#5147)
* uops dfs cleanup

* Update uops.py
2024-06-25 18:51:42 -07:00
George Hotz 6841ea3baf
don't allow duplicate variables (#5148) 2024-06-25 18:47:29 -07:00
George Hotz cc7fafcd8b
sink folding rule [run_process_replay] (#5145) 2024-06-25 18:34:44 -07:00
Jhenner Tigreros fa78755f19
Add new patterns to unfold division (#5139)
* Add new patterns to unfold division

* Create regression test and fix pattern
2024-06-25 18:07:47 -07:00
qazal c4fdb9c725
second iteration on verify_lazyop (#5140) 2024-06-25 09:44:32 +03:00
chenyu dade7677cf
validate llama3 output only with model "LLaMA-3/8B-SF-DPO" (#5138) 2024-06-24 20:58:25 -04:00
qazal 981afb114f
safely fold NEG in lazy.py (#5135)
* safe

* add test
2024-06-24 19:40:37 -04:00
chenyu 7948b05738
fix uneven shard with shrink and pad args on sharded axis (#5131)
it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)
2024-06-24 16:55:50 -04:00
qazal 18e70deec3
verify_lazyop (#5124)
* start verify_lazyop

* bfs order

* assert

* assert shapetrackers 2

* refactor

* more iteration

* skips

* that ast was wrong too
2024-06-24 13:45:35 -07:00
qazal fe707bc968
hotfix: don't use is for comparing dtype (#5128) 2024-06-24 14:12:34 -04:00
Jhenner Tigreros dfa562dbc1
DEFINE_ACC takes UOps.CONST in vin instead of arg (#4975)
* Change DEFINE_ACC to receive UOps.CONST in vin

* Use localtype instead of acc dtype

* Fix idp

* Fix copy list

* Fix warp

* Fix error

* Fix merge

* Fix testing

* Fix merge

* Use deepcopy

* Change to copy of inp

* Fix lint

* Move const to first place

* Fix issue upat

* Fix upat patterns

* Change to list, to test permutations

* Add condition

* Change pm

* Revert change pm

* Remove unused rule

* Fix

* Change of float4 DEFINE_ACC values

* Cast on PM to correct dtype

* Improve assert message

* Move IFs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-06-24 09:25:33 -07:00
nimlgen d84beaa6dd
tiny profiler cleanups (#5126) 2024-06-24 17:02:31 +03:00
chenyu 4a7d403777
cleanup test_multitensor (#5118)
renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples
2024-06-23 20:54:22 -04:00
chenyu c0ba5e0dfb
multi copy_to_device return the copy on same device if possible (#5117)
previously it always returns from the first device
2024-06-23 20:25:56 -04:00
Francis Lam b563cd52ed
linearizer: change globals to merge into left axis/gridDims.x first (#5033)
* linearizer: change order of collapse to be left-most

also fixes Variable max size to be correct and add docs for the off
parameter

* fix multiple global dim oversizes

* add passing variable test and reorganize tests

* use assert RuntimeError for failing test
2024-06-23 18:53:15 -04:00
nimlgen 69f116a7e1
nv/amd profiler (#4718)
* nv/amd profiler

* fix

* fix

* profile copies

* profile logger

* fixes

* more fixes

* less lines and fixes

* fixes

* some linter

* back sync, no related change

* fix gpu2cpu time def

* simpler

* linter

* linter

* docs

* add add_event api
2024-06-23 17:10:12 +03:00
qazal 64a3b7931e
simplify render_ops ctx [run_process_replay] (#5116)
* new ctx

* delete DEFINE_VAR

* lt isnt static
2024-06-23 16:56:32 +03:00
qazal 28bf8d86d8
test_linearizer with multi output ASTs (#5115)
* ast is tuple

* run test_phi_simplification

* update reason

* more tc

* beam

* a few more

* use test_opt directly
2024-06-23 15:41:24 +03:00
chenyu ee0c6dfc15
build Tensor._tri with movements only (#5110)
* build Tensor._tri with movements only

doesn't need arange, saved a kernel in attention mask

* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu 20fabd8a5b
update Tensor.triu and Tensor.tril (#5109)
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
chenyu 8f6ae84e4a
minor cleanup of conv_transpose2d (#5108)
* minor cleanup of conv_transpose2d

* that
2024-06-22 21:31:47 -04:00
chenyu 33211f356b
fix desc in tqdm (#5107)
per doc `https://tqdm.github.io/docs/tqdm/`, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty.

updated test cases and added a test for set_description
2024-06-22 19:00:38 -04:00
chenyu 055e616302
cleanup mnist data load in beautiful_mnist (#5106) 2024-06-22 18:31:51 -04:00
chenyu 5516b790ad
hotfix append colon space to tqdm set_description (#5105) 2024-06-22 18:09:14 -04:00
chenyu e356807696
tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
chenyu 8080298739
s/tinytqdm/tqdm (#5103)
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
George Hotz 9f875123b6
small changes from lowerer. [run_process_replay] [no_assert] (#5102) 2024-06-22 11:09:35 -07:00
chenyu e468601226
update llama attention casting (#5096)
* update llama attention casting

updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.

* fix that
2024-06-22 10:57:17 -04:00
chenyu ca021229e4
fix attention to always return in the same dtype as input (#5100)
mid cast to default_float does not work as intended when default is float32 and qkv is in half
2024-06-22 10:34:57 -04:00
nimlgen 2dcef5a0d7
hcq spec (#5081)
* hcq spec

* small change

* not used import

* fixes

* fix

* signals into base class

* more into base class

* remove imports

* fix wrap timeline

* raise when not implemented

* simpler
2024-06-22 15:32:12 +03:00
chenyu 8bd6cb9511
update llama model RMSNorm casting (#5095)
following the original implementation, cast back to input dtype before multiplying weight. slightly faster
https://github.com/meta-llama/llama/blob/main/llama/model.py
2024-06-21 23:02:04 -04:00
chenyu 0c857ae2d6
some onnx_ops cleanups (#5094) 2024-06-21 22:01:32 -04:00
kormann f4a041af16
Simplify graph_dedup [run_process_replay] (#5084)
* reset master

* remvpe double default
2024-06-21 22:12:30 +03:00
chenyu 00593d6095
clean the long lines in avg_pool2d and max_pool2d (#5091) 2024-06-21 14:46:56 -04:00
chenyu a971dc6218
argmax(axis=None) is argmax.flatten().argmax(0) (#5090)
removed the alternative code path
2024-06-21 14:17:10 -04:00
chenyu 166a2b19b5
fix reduce axis of 0d tensors (#5089)
`x.sum(())` is fine, and `x.sum((1,))` should throw IndexError
2024-06-21 13:51:40 -04:00
chenyu 3ff048b68c
type annotate reduce axis in tensor.py (#5088) 2024-06-21 13:06:10 -04:00
chenyu 36b4a492a1
explicitly check getitem indices can have at most one ellipsis (#5087)
* explicitly check getitem indices can have at most one ellipsis

previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```

this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```

* oh we have that already

* test that

* test these
2024-06-21 12:33:18 -04:00
nimlgen f1e758bacb
graph fuzzer (#5082)
* graph fuzzer

* more options

* mypy

* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal 5717a54b28
don't use Tensor.empty in kernel opts tests (#5086) 2024-06-21 18:41:03 +03:00
qazal 8aa786232d
docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00
nimlgen fb1bf48cfe
io_uring for copies from disk (#5035)
* exp uring

* fixes and old version

* nv

* cleaner

* cmp vs aio

* fix

* no lib

* fix nv

* linter

* disk_speed_test now runs default

* fixes

* uring -> io_uring

* linter happy

* get_temp_buf comment added

* tiny nits

* put wait back

* test runs everywhere

* remove consts

* remove mmap consts

* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
George Hotz b69afc67d8 tinybox docs typo 2024-06-20 17:58:40 -07:00
George Hotz 6bc5e5f41c start tinybox docs 2024-06-20 17:04:45 -07:00
chenyu f6d6760f71
don't cast tuple to list before creating Tensor (#5071)
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
qazal 97f1347dd9
fix check_process_replay for special characters (#5072)
* 'test' [run_process_replay] [no_assert]

* test with ( ) { } '' " "

* remove the log [run_process_replay] '' () { } '{

* helpful echos [run_process_replay] [no_assert] () ''

* test [run_process_replay] [no_assert]

* test2 [run_process_replay] [no_assert]

* test3 [run_process_replay] [no_assert]

* it's also correct this way [run_process_replay] [no_assert]

* remove extras [run_process_replay]
2024-06-20 20:23:29 +03:00
George Hotz 6f6b3b10c9
import from uops, not linearizer (#5064) 2024-06-20 08:08:44 -07:00
chenyu 50700171ef
minor cleanup to reshape arg handling (#5070)
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00