George Hotz
c98ca23cb9
test pickle variable ( #5150 )
...
* test pickle variable
* fix process replay
2024-06-25 19:49:21 -07:00
David Hou
8fcc41582f
make buffer view optional with a flag ( #5120 )
2024-06-25 19:13:20 -07:00
George Hotz
63ba2d05d1
uops dfs cleanup ( #5147 )
...
* uops dfs cleanup
* Update uops.py
2024-06-25 18:51:42 -07:00
George Hotz
6841ea3baf
don't allow duplicate variables ( #5148 )
2024-06-25 18:47:29 -07:00
George Hotz
cc7fafcd8b
sink folding rule [run_process_replay] ( #5145 )
2024-06-25 18:34:44 -07:00
Jhenner Tigreros
fa78755f19
Add new patterns to unfold division ( #5139 )
...
* Add new patterns to unfold division
* Create regression test and fix pattern
2024-06-25 18:07:47 -07:00
qazal
c4fdb9c725
second iteration on verify_lazyop ( #5140 )
2024-06-25 09:44:32 +03:00
chenyu
dade7677cf
validate llama3 output only with model "LLaMA-3/8B-SF-DPO" ( #5138 )
2024-06-24 20:58:25 -04:00
qazal
981afb114f
safely fold NEG in lazy.py ( #5135 )
...
* safe
* add test
2024-06-24 19:40:37 -04:00
chenyu
7948b05738
fix uneven shard with shrink and pad args on sharded axis ( #5131 )
...
it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)
2024-06-24 16:55:50 -04:00
qazal
18e70deec3
verify_lazyop ( #5124 )
...
* start verify_lazyop
* bfs order
* assert
* assert shapetrackers 2
* refactor
* more iteration
* skips
* that ast was wrong too
2024-06-24 13:45:35 -07:00
qazal
fe707bc968
hotfix: don't use is for comparing dtype ( #5128 )
2024-06-24 14:12:34 -04:00
Jhenner Tigreros
dfa562dbc1
DEFINE_ACC takes UOps.CONST in vin instead of arg ( #4975 )
...
* Change DEFINE_ACC to receive UOps.CONST in vin
* Use localtype instead of acc dtype
* Fix idp
* Fix copy list
* Fix warp
* Fix error
* Fix merge
* Fix testing
* Fix merge
* Use deepcopy
* Change to copy of inp
* Fix lint
* Move const to first place
* Fix issue upat
* Fix upat patterns
* Change to list, to test permutations
* Add condition
* Change pm
* Revert change pm
* Remove unused rule
* Fix
* Change of float4 DEFINE_ACC values
* Cast on PM to correct dtype
* Improve assert message
* Move IFs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-06-24 09:25:33 -07:00
nimlgen
d84beaa6dd
tiny profiler cleanups ( #5126 )
2024-06-24 17:02:31 +03:00
chenyu
4a7d403777
cleanup test_multitensor ( #5118 )
...
renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples
2024-06-23 20:54:22 -04:00
chenyu
c0ba5e0dfb
multi copy_to_device return the copy on same device if possible ( #5117 )
...
previously it always returns from the first device
2024-06-23 20:25:56 -04:00
Francis Lam
b563cd52ed
linearizer: change globals to merge into left axis/gridDims.x first ( #5033 )
...
* linearizer: change order of collapse to be left-most
also fixes Variable max size to be correct and add docs for the off
parameter
* fix multiple global dim oversizes
* add passing variable test and reorganize tests
* use assert RuntimeError for failing test
2024-06-23 18:53:15 -04:00
nimlgen
69f116a7e1
nv/amd profiler ( #4718 )
...
* nv/amd profiler
* fix
* fix
* profile copies
* profile logger
* fixes
* more fixes
* less lines and fixes
* fixes
* some linter
* back sync, no related change
* fix gpu2cpu time def
* simpler
* linter
* linter
* docs
* add add_event api
2024-06-23 17:10:12 +03:00
qazal
64a3b7931e
simplify render_ops ctx [run_process_replay] ( #5116 )
...
* new ctx
* delete DEFINE_VAR
* lt isnt static
2024-06-23 16:56:32 +03:00
qazal
28bf8d86d8
test_linearizer with multi output ASTs ( #5115 )
...
* ast is tuple
* run test_phi_simplification
* update reason
* more tc
* beam
* a few more
* use test_opt directly
2024-06-23 15:41:24 +03:00
chenyu
ee0c6dfc15
build Tensor._tri with movements only ( #5110 )
...
* build Tensor._tri with movements only
doesn't need arange, saved a kernel in attention mask
* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu
20fabd8a5b
update Tensor.triu and Tensor.tril ( #5109 )
...
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
chenyu
8f6ae84e4a
minor cleanup of conv_transpose2d ( #5108 )
...
* minor cleanup of conv_transpose2d
* that
2024-06-22 21:31:47 -04:00
chenyu
33211f356b
fix desc in tqdm ( #5107 )
...
per doc `https://tqdm.github.io/docs/tqdm/ `, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty.
updated test cases and added a test for set_description
2024-06-22 19:00:38 -04:00
chenyu
055e616302
cleanup mnist data load in beautiful_mnist ( #5106 )
2024-06-22 18:31:51 -04:00
chenyu
5516b790ad
hotfix append colon space to tqdm set_description ( #5105 )
2024-06-22 18:09:14 -04:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
George Hotz
9f875123b6
small changes from lowerer. [run_process_replay] [no_assert] ( #5102 )
2024-06-22 11:09:35 -07:00
chenyu
e468601226
update llama attention casting ( #5096 )
...
* update llama attention casting
updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.
* fix that
2024-06-22 10:57:17 -04:00
chenyu
ca021229e4
fix attention to always return in the same dtype as input ( #5100 )
...
mid cast to default_float does not work as intended when default is float32 and qkv is in half
2024-06-22 10:34:57 -04:00
nimlgen
2dcef5a0d7
hcq spec ( #5081 )
...
* hcq spec
* small change
* not used import
* fixes
* fix
* signals into base class
* more into base class
* remove imports
* fix wrap timeline
* raise when not implemented
* simpler
2024-06-22 15:32:12 +03:00
chenyu
8bd6cb9511
update llama model RMSNorm casting ( #5095 )
...
following the original implementation, cast back to input dtype before multiplying weight. slightly faster
https://github.com/meta-llama/llama/blob/main/llama/model.py
2024-06-21 23:02:04 -04:00
chenyu
0c857ae2d6
some onnx_ops cleanups ( #5094 )
2024-06-21 22:01:32 -04:00
kormann
f4a041af16
Simplify graph_dedup [run_process_replay] ( #5084 )
...
* reset master
* remvpe double default
2024-06-21 22:12:30 +03:00
chenyu
00593d6095
clean the long lines in avg_pool2d and max_pool2d ( #5091 )
2024-06-21 14:46:56 -04:00
chenyu
a971dc6218
argmax(axis=None) is argmax.flatten().argmax(0) ( #5090 )
...
removed the alternative code path
2024-06-21 14:17:10 -04:00
chenyu
166a2b19b5
fix reduce axis of 0d tensors ( #5089 )
...
`x.sum(())` is fine, and `x.sum((1,))` should throw IndexError
2024-06-21 13:51:40 -04:00
chenyu
3ff048b68c
type annotate reduce axis in tensor.py ( #5088 )
2024-06-21 13:06:10 -04:00
chenyu
36b4a492a1
explicitly check getitem indices can have at most one ellipsis ( #5087 )
...
* explicitly check getitem indices can have at most one ellipsis
previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```
this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```
* oh we have that already
* test that
* test these
2024-06-21 12:33:18 -04:00
nimlgen
f1e758bacb
graph fuzzer ( #5082 )
...
* graph fuzzer
* more options
* mypy
* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
5717a54b28
don't use Tensor.empty in kernel opts tests ( #5086 )
2024-06-21 18:41:03 +03:00
qazal
8aa786232d
docs for running process replay locally ( #5083 )
2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe
io_uring for copies from disk ( #5035 )
...
* exp uring
* fixes and old version
* nv
* cleaner
* cmp vs aio
* fix
* no lib
* fix nv
* linter
* disk_speed_test now runs default
* fixes
* uring -> io_uring
* linter happy
* get_temp_buf comment added
* tiny nits
* put wait back
* test runs everywhere
* remove consts
* remove mmap consts
* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
George Hotz
b69afc67d8
tinybox docs typo
2024-06-20 17:58:40 -07:00
George Hotz
6bc5e5f41c
start tinybox docs
2024-06-20 17:04:45 -07:00
chenyu
f6d6760f71
don't cast tuple to list before creating Tensor ( #5071 )
...
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
qazal
97f1347dd9
fix check_process_replay for special characters ( #5072 )
...
* 'test' [run_process_replay] [no_assert]
* test with ( ) { } '' " "
* remove the log [run_process_replay] '' () { } '{
* helpful echos [run_process_replay] [no_assert] () ''
* test [run_process_replay] [no_assert]
* test2 [run_process_replay] [no_assert]
* test3 [run_process_replay] [no_assert]
* it's also correct this way [run_process_replay] [no_assert]
* remove extras [run_process_replay]
2024-06-20 20:23:29 +03:00
George Hotz
6f6b3b10c9
import from uops, not linearizer ( #5064 )
2024-06-20 08:08:44 -07:00
chenyu
50700171ef
minor cleanup to reshape arg handling ( #5070 )
...
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00