Commit Graph

4862 Commits

Author SHA1 Message Date
Roelof van Dijk 9704c7d4d4
ruff rule if-exp-instead-of-or-operator (FURB110) (#5178)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:22:19 -07:00
chenyu 5b8fda3c65
fix: JIT=0 means no JIT (#5188) 2024-06-27 10:31:37 -04:00
qazal 3af17849bf
safely parse quoted titles [run_process_replay] (#5183) 2024-06-27 16:39:48 +03:00
Roelof van Dijk 975b811ad9
names shadowing builtins (#5179)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:15:01 -04:00
Roelof van Dijk 26e254c42b
ruff: else-raise and else-return (#5175)
* ruff: enable else-raise and else-return

* ruff: add error names

* fix order

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 07:54:59 -04:00
Roelof van Dijk f88f71d73a
ruff: unnecessary-comprehension (#5174)
* enable ruff C416 unnecessary-comprehension

* already a list
2024-06-27 07:45:29 -04:00
reddyn12 f1c7944c44
Fix batchnorm shapes for resnet.load_pretrained (#5167)
* Fix batchnorm shapes

* make it general reshape
2024-06-26 18:44:10 -04:00
George Hotz 396ce6cfc9
clean up graph dedup function [run_process_replay] (#5169) 2024-06-26 15:07:34 -07:00
kormann 3a04e518ec
print_tree UPat +fix (#5132)
* fix and extend print_tree

* typing

* typing

* fix upat

* fix none

* ws

* rm prefix

* mv luop dag

* typo

* test print_tree
2024-06-26 15:02:19 -07:00
chenyu 0ba093dea0
hotfix: only validate stable diffusion when using threefry (#5166) 2024-06-26 16:50:38 -04:00
chenyu e4a5870b36
validate stable_diffusion output (#5163)
changed default steps, forgot to update validation
2024-06-26 16:42:21 -04:00
nimlgen 21b225ac45
llama3 download works (#5160) 2024-06-26 22:45:13 +03:00
wozeparrot c91b3c4079
shard llama3 on 0 sometimes (#5157) 2024-06-26 11:50:57 -07:00
Roelof van Dijk 294bd1a9ff
refactor: name check [run_process_replay] (#5158) 2024-06-26 11:39:41 -07:00
Roelof van Dijk 2c80583e14
perf: cache const UOp creation [run_process_replay] (#5156) 2024-06-26 11:13:14 -07:00
George Hotz eda2824cd8
freeze uop [run_process_replay] (#5155) 2024-06-26 10:18:15 -07:00
Elias Wahl e267f3161d
Add MLLogger (#5125)
* add MLPerf logger

* eval steps

* start with step 1

* compliance for 3.1.0 and 4.0.0

* more compliance

* assert, comment and contiguous
2024-06-26 12:23:56 -04:00
nimlgen 16405b973a
fix hcq sync (#5062)
* fix hcq sync

* rewrite

* linter + comment

* fix profiler

* no default dict

* correct sync of unjitted transfer

* fix test
2024-06-26 17:50:37 +03:00
David Hou 3604642847
Llama shard axis 0 sometimes (#5123)
* make buffer view optional with a flag [run_process_replay]

* do not view when sharding to save memory [run_process_replay]

* llama shard axis=0 sometimes

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-26 10:35:25 -04:00
nimlgen fd27f19e92
graph tests (#5153)
* graph tests

* add test

* cleanup
2024-06-26 16:31:20 +03:00
George Hotz 7b709c3ccd
switch tensorcoreoptions to tuple [run_process_replay] (#5143)
* switch tensorcoreoptions to tuple [run_process_replay]

* localbuffer can stay namedtuple for now

* freeze LocalBuffer

* remove NamedTuple

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-26 14:12:53 +03:00
qazal 6ca7b13ed1
limit pickled objects [run_process_replay] (#5154)
* limit pickled objects

* delete uop from the list

* debug metal

* need self.opts for TC

* dont need device

* [run_process_replay]

* minor
2024-06-26 13:51:32 +03:00
George Hotz ee4f080a14
rewrite div const [run_process_replay] [no_assert] (#5151)
* rewrite div const [run_process_replay] [no_assert]

* Update uops.py
2024-06-25 20:23:14 -07:00
David Hou 666a9c1448
don't view origin buffer when sharding (#5122)
* make buffer view optional with a flag

* do not view when sharding to save memory
2024-06-25 20:19:09 -07:00
George Hotz 89e106686a
simpler unmatch [run_process_replay] (#5149) 2024-06-25 19:57:40 -07:00
George Hotz c98ca23cb9
test pickle variable (#5150)
* test pickle variable

* fix process replay
2024-06-25 19:49:21 -07:00
David Hou 8fcc41582f
make buffer view optional with a flag (#5120) 2024-06-25 19:13:20 -07:00
George Hotz 63ba2d05d1
uops dfs cleanup (#5147)
* uops dfs cleanup

* Update uops.py
2024-06-25 18:51:42 -07:00
George Hotz 6841ea3baf
don't allow duplicate variables (#5148) 2024-06-25 18:47:29 -07:00
George Hotz cc7fafcd8b
sink folding rule [run_process_replay] (#5145) 2024-06-25 18:34:44 -07:00
Jhenner Tigreros fa78755f19
Add new patterns to unfold division (#5139)
* Add new patterns to unfold division

* Create regression test and fix pattern
2024-06-25 18:07:47 -07:00
qazal c4fdb9c725
second iteration on verify_lazyop (#5140) 2024-06-25 09:44:32 +03:00
chenyu dade7677cf
validate llama3 output only with model "LLaMA-3/8B-SF-DPO" (#5138) 2024-06-24 20:58:25 -04:00
qazal 981afb114f
safely fold NEG in lazy.py (#5135)
* safe

* add test
2024-06-24 19:40:37 -04:00
chenyu 7948b05738
fix uneven shard with shrink and pad args on sharded axis (#5131)
it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)
2024-06-24 16:55:50 -04:00
qazal 18e70deec3
verify_lazyop (#5124)
* start verify_lazyop

* bfs order

* assert

* assert shapetrackers 2

* refactor

* more iteration

* skips

* that ast was wrong too
2024-06-24 13:45:35 -07:00
qazal fe707bc968
hotfix: don't use is for comparing dtype (#5128) 2024-06-24 14:12:34 -04:00
Jhenner Tigreros dfa562dbc1
DEFINE_ACC takes UOps.CONST in vin instead of arg (#4975)
* Change DEFINE_ACC to receive UOps.CONST in vin

* Use localtype instead of acc dtype

* Fix idp

* Fix copy list

* Fix warp

* Fix error

* Fix merge

* Fix testing

* Fix merge

* Use deepcopy

* Change to copy of inp

* Fix lint

* Move const to first place

* Fix issue upat

* Fix upat patterns

* Change to list, to test permutations

* Add condition

* Change pm

* Revert change pm

* Remove unused rule

* Fix

* Change of float4 DEFINE_ACC values

* Cast on PM to correct dtype

* Improve assert message

* Move IFs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-06-24 09:25:33 -07:00
nimlgen d84beaa6dd
tiny profiler cleanups (#5126) 2024-06-24 17:02:31 +03:00
chenyu 4a7d403777
cleanup test_multitensor (#5118)
renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples
2024-06-23 20:54:22 -04:00
chenyu c0ba5e0dfb
multi copy_to_device return the copy on same device if possible (#5117)
previously it always returns from the first device
2024-06-23 20:25:56 -04:00
Francis Lam b563cd52ed
linearizer: change globals to merge into left axis/gridDims.x first (#5033)
* linearizer: change order of collapse to be left-most

also fixes Variable max size to be correct and add docs for the off
parameter

* fix multiple global dim oversizes

* add passing variable test and reorganize tests

* use assert RuntimeError for failing test
2024-06-23 18:53:15 -04:00
nimlgen 69f116a7e1
nv/amd profiler (#4718)
* nv/amd profiler

* fix

* fix

* profile copies

* profile logger

* fixes

* more fixes

* less lines and fixes

* fixes

* some linter

* back sync, no related change

* fix gpu2cpu time def

* simpler

* linter

* linter

* docs

* add add_event api
2024-06-23 17:10:12 +03:00
qazal 64a3b7931e
simplify render_ops ctx [run_process_replay] (#5116)
* new ctx

* delete DEFINE_VAR

* lt isnt static
2024-06-23 16:56:32 +03:00
qazal 28bf8d86d8
test_linearizer with multi output ASTs (#5115)
* ast is tuple

* run test_phi_simplification

* update reason

* more tc

* beam

* a few more

* use test_opt directly
2024-06-23 15:41:24 +03:00
chenyu ee0c6dfc15
build Tensor._tri with movements only (#5110)
* build Tensor._tri with movements only

doesn't need arange, saved a kernel in attention mask

* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu 20fabd8a5b
update Tensor.triu and Tensor.tril (#5109)
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
chenyu 8f6ae84e4a
minor cleanup of conv_transpose2d (#5108)
* minor cleanup of conv_transpose2d

* that
2024-06-22 21:31:47 -04:00
chenyu 33211f356b
fix desc in tqdm (#5107)
per doc `https://tqdm.github.io/docs/tqdm/`, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty.

updated test cases and added a test for set_description
2024-06-22 19:00:38 -04:00
chenyu 055e616302
cleanup mnist data load in beautiful_mnist (#5106) 2024-06-22 18:31:51 -04:00