Roelof van Dijk
9704c7d4d4
ruff rule if-exp-instead-of-or-operator (FURB110) ( #5178 )
...
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:22:19 -07:00
chenyu
5b8fda3c65
fix: JIT=0 means no JIT ( #5188 )
2024-06-27 10:31:37 -04:00
qazal
3af17849bf
safely parse quoted titles [run_process_replay] ( #5183 )
2024-06-27 16:39:48 +03:00
Roelof van Dijk
975b811ad9
names shadowing builtins ( #5179 )
...
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:15:01 -04:00
Roelof van Dijk
26e254c42b
ruff: else-raise and else-return ( #5175 )
...
* ruff: enable else-raise and else-return
* ruff: add error names
* fix order
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 07:54:59 -04:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
reddyn12
f1c7944c44
Fix batchnorm shapes for resnet.load_pretrained ( #5167 )
...
* Fix batchnorm shapes
* make it general reshape
2024-06-26 18:44:10 -04:00
George Hotz
396ce6cfc9
clean up graph dedup function [run_process_replay] ( #5169 )
2024-06-26 15:07:34 -07:00
kormann
3a04e518ec
print_tree UPat +fix ( #5132 )
...
* fix and extend print_tree
* typing
* typing
* fix upat
* fix none
* ws
* rm prefix
* mv luop dag
* typo
* test print_tree
2024-06-26 15:02:19 -07:00
chenyu
0ba093dea0
hotfix: only validate stable diffusion when using threefry ( #5166 )
2024-06-26 16:50:38 -04:00
chenyu
e4a5870b36
validate stable_diffusion output ( #5163 )
...
changed default steps, forgot to update validation
2024-06-26 16:42:21 -04:00
nimlgen
21b225ac45
llama3 download works ( #5160 )
2024-06-26 22:45:13 +03:00
wozeparrot
c91b3c4079
shard llama3 on 0 sometimes ( #5157 )
2024-06-26 11:50:57 -07:00
Roelof van Dijk
294bd1a9ff
refactor: name check [run_process_replay] ( #5158 )
2024-06-26 11:39:41 -07:00
Roelof van Dijk
2c80583e14
perf: cache const UOp creation [run_process_replay] ( #5156 )
2024-06-26 11:13:14 -07:00
George Hotz
eda2824cd8
freeze uop [run_process_replay] ( #5155 )
2024-06-26 10:18:15 -07:00
Elias Wahl
e267f3161d
Add MLLogger ( #5125 )
...
* add MLPerf logger
* eval steps
* start with step 1
* compliance for 3.1.0 and 4.0.0
* more compliance
* assert, comment and contiguous
2024-06-26 12:23:56 -04:00
nimlgen
16405b973a
fix hcq sync ( #5062 )
...
* fix hcq sync
* rewrite
* linter + comment
* fix profiler
* no default dict
* correct sync of unjitted transfer
* fix test
2024-06-26 17:50:37 +03:00
David Hou
3604642847
Llama shard axis 0 sometimes ( #5123 )
...
* make buffer view optional with a flag [run_process_replay]
* do not view when sharding to save memory [run_process_replay]
* llama shard axis=0 sometimes
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-26 10:35:25 -04:00
nimlgen
fd27f19e92
graph tests ( #5153 )
...
* graph tests
* add test
* cleanup
2024-06-26 16:31:20 +03:00
George Hotz
7b709c3ccd
switch tensorcoreoptions to tuple [run_process_replay] ( #5143 )
...
* switch tensorcoreoptions to tuple [run_process_replay]
* localbuffer can stay namedtuple for now
* freeze LocalBuffer
* remove NamedTuple
---------
Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-26 14:12:53 +03:00
qazal
6ca7b13ed1
limit pickled objects [run_process_replay] ( #5154 )
...
* limit pickled objects
* delete uop from the list
* debug metal
* need self.opts for TC
* dont need device
* [run_process_replay]
* minor
2024-06-26 13:51:32 +03:00
George Hotz
ee4f080a14
rewrite div const [run_process_replay] [no_assert] ( #5151 )
...
* rewrite div const [run_process_replay] [no_assert]
* Update uops.py
2024-06-25 20:23:14 -07:00
David Hou
666a9c1448
don't view origin buffer when sharding ( #5122 )
...
* make buffer view optional with a flag
* do not view when sharding to save memory
2024-06-25 20:19:09 -07:00
George Hotz
89e106686a
simpler unmatch [run_process_replay] ( #5149 )
2024-06-25 19:57:40 -07:00
George Hotz
c98ca23cb9
test pickle variable ( #5150 )
...
* test pickle variable
* fix process replay
2024-06-25 19:49:21 -07:00
David Hou
8fcc41582f
make buffer view optional with a flag ( #5120 )
2024-06-25 19:13:20 -07:00
George Hotz
63ba2d05d1
uops dfs cleanup ( #5147 )
...
* uops dfs cleanup
* Update uops.py
2024-06-25 18:51:42 -07:00
George Hotz
6841ea3baf
don't allow duplicate variables ( #5148 )
2024-06-25 18:47:29 -07:00
George Hotz
cc7fafcd8b
sink folding rule [run_process_replay] ( #5145 )
2024-06-25 18:34:44 -07:00
Jhenner Tigreros
fa78755f19
Add new patterns to unfold division ( #5139 )
...
* Add new patterns to unfold division
* Create regression test and fix pattern
2024-06-25 18:07:47 -07:00
qazal
c4fdb9c725
second iteration on verify_lazyop ( #5140 )
2024-06-25 09:44:32 +03:00
chenyu
dade7677cf
validate llama3 output only with model "LLaMA-3/8B-SF-DPO" ( #5138 )
2024-06-24 20:58:25 -04:00
qazal
981afb114f
safely fold NEG in lazy.py ( #5135 )
...
* safe
* add test
2024-06-24 19:40:37 -04:00
chenyu
7948b05738
fix uneven shard with shrink and pad args on sharded axis ( #5131 )
...
it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)
2024-06-24 16:55:50 -04:00
qazal
18e70deec3
verify_lazyop ( #5124 )
...
* start verify_lazyop
* bfs order
* assert
* assert shapetrackers 2
* refactor
* more iteration
* skips
* that ast was wrong too
2024-06-24 13:45:35 -07:00
qazal
fe707bc968
hotfix: don't use is for comparing dtype ( #5128 )
2024-06-24 14:12:34 -04:00
Jhenner Tigreros
dfa562dbc1
DEFINE_ACC takes UOps.CONST in vin instead of arg ( #4975 )
...
* Change DEFINE_ACC to receive UOps.CONST in vin
* Use localtype instead of acc dtype
* Fix idp
* Fix copy list
* Fix warp
* Fix error
* Fix merge
* Fix testing
* Fix merge
* Use deepcopy
* Change to copy of inp
* Fix lint
* Move const to first place
* Fix issue upat
* Fix upat patterns
* Change to list, to test permutations
* Add condition
* Change pm
* Revert change pm
* Remove unused rule
* Fix
* Change of float4 DEFINE_ACC values
* Cast on PM to correct dtype
* Improve assert message
* Move IFs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-06-24 09:25:33 -07:00
nimlgen
d84beaa6dd
tiny profiler cleanups ( #5126 )
2024-06-24 17:02:31 +03:00
chenyu
4a7d403777
cleanup test_multitensor ( #5118 )
...
renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples
2024-06-23 20:54:22 -04:00
chenyu
c0ba5e0dfb
multi copy_to_device return the copy on same device if possible ( #5117 )
...
previously it always returns from the first device
2024-06-23 20:25:56 -04:00
Francis Lam
b563cd52ed
linearizer: change globals to merge into left axis/gridDims.x first ( #5033 )
...
* linearizer: change order of collapse to be left-most
also fixes Variable max size to be correct and add docs for the off
parameter
* fix multiple global dim oversizes
* add passing variable test and reorganize tests
* use assert RuntimeError for failing test
2024-06-23 18:53:15 -04:00
nimlgen
69f116a7e1
nv/amd profiler ( #4718 )
...
* nv/amd profiler
* fix
* fix
* profile copies
* profile logger
* fixes
* more fixes
* less lines and fixes
* fixes
* some linter
* back sync, no related change
* fix gpu2cpu time def
* simpler
* linter
* linter
* docs
* add add_event api
2024-06-23 17:10:12 +03:00
qazal
64a3b7931e
simplify render_ops ctx [run_process_replay] ( #5116 )
...
* new ctx
* delete DEFINE_VAR
* lt isnt static
2024-06-23 16:56:32 +03:00
qazal
28bf8d86d8
test_linearizer with multi output ASTs ( #5115 )
...
* ast is tuple
* run test_phi_simplification
* update reason
* more tc
* beam
* a few more
* use test_opt directly
2024-06-23 15:41:24 +03:00
chenyu
ee0c6dfc15
build Tensor._tri with movements only ( #5110 )
...
* build Tensor._tri with movements only
doesn't need arange, saved a kernel in attention mask
* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu
20fabd8a5b
update Tensor.triu and Tensor.tril ( #5109 )
...
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
chenyu
8f6ae84e4a
minor cleanup of conv_transpose2d ( #5108 )
...
* minor cleanup of conv_transpose2d
* that
2024-06-22 21:31:47 -04:00
chenyu
33211f356b
fix desc in tqdm ( #5107 )
...
per doc `https://tqdm.github.io/docs/tqdm/ `, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty.
updated test cases and added a test for set_description
2024-06-22 19:00:38 -04:00
chenyu
055e616302
cleanup mnist data load in beautiful_mnist ( #5106 )
2024-06-22 18:31:51 -04:00