George Hotz
3e1336957d
test arange with all opts ( #5923 )
...
* test arange with all opts
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
* Update test_arange.py
2024-08-05 18:38:25 -07:00
George Hotz
5d17f54e3c
fast mnist indexing ( #5921 )
...
* fast mnist indexing
* more tests
* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
George Hotz
e81c18f494
make the arange test check correctness [run_process_replay] ( #5920 )
2024-08-05 13:41:06 -07:00
George Hotz
8d1c884e78
capture the const pattern in both directions ( #5919 )
...
* capture the const pattern in both directions
* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
42f599870c
unroll arange is broken ( #5918 )
...
* unroll arange is broken
* fix unrolled arange
* one more test
2024-08-05 12:15:07 -07:00
qazal
70949ea7e6
test cstyle compile error for max with inline const ( #5838 )
...
* test_failure_46
* GPU=1 fails too
* add test_renderer
* add failing platforms
* nv too
* assert return value
2024-08-05 19:02:16 +03:00
qazal
e0c6520138
check arange fusing with VIEW and COPY ( #5912 )
...
* check arange fusing with VIEW and COPY
* gpu and clang
2024-08-05 17:09:21 +03:00
nimlgen
590b9ebb34
hcq copy queue is optional ( #5909 )
...
* hcq copy queue is optional
* one more
* this
2024-08-05 14:03:25 +03:00
George Hotz
159ac06b5b
remove unused reduce rules + improve unparented ( #5908 )
...
* remove unused reduce rules [run_process_replay]
* this work
* those tests are meaningless now
2024-08-04 18:18:27 -07:00
George Hotz
d7387d31bf
remove useless reduce cases [run_process_replay] ( #5907 )
...
* remove useless reduce cases [run_process_replay]
* do_reduce cleanup
* more cleanups + no longer supported tests
* Revert "more cleanups + no longer supported tests"
This reverts commit e9f2f6ba7061f8697a308aacdc3442fa922a77f5.
* no longer supported tests
* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
George Hotz
be8958e26b
use CONTRACT before REDUCE ( #5903 )
...
* use CONTRACT before REDUCE [run_process_replay]
* support half expand
* EXPAND GEP
2024-08-04 16:17:33 -07:00
chenyu
4a65010de8
remove CUDACPU flag in tests [run_process_replay] ( #5902 )
...
no longer used
2024-08-04 16:06:38 -04:00
qazal
aad9234e52
test fused precompute_freqs_cis ( #5900 )
...
* test_precompute_freqs_cis
* tiny for ci
2024-08-04 21:01:05 +03:00
chenyu
c67e9887f7
support using str to specify dtype ( #5897 )
...
* support using str to specify dtype
in Tensor creation and args into `cast` and `bitcast`, and acc_dtype
* more tests
2024-08-04 12:56:28 -04:00
qazal
4c5ef2cc4f
setitem with arange fusion 1 ( #5898 )
2024-08-04 16:09:21 +03:00
chenyu
da61dea1b2
simple failed UOp sub symbolic test case ( #5894 )
2024-08-03 14:27:23 -04:00
qazal
56ef9e453e
pad reduceops to the max of each dimension ( #5889 )
...
* early verify
* pad reduceops to the max of each dim
* remove the function
2024-08-03 14:03:30 +03:00
qazal
65fa86901a
indexing fusion 2 ( #5888 )
...
* arange fusion
* kernels that fuse
* tests
2024-08-03 13:13:39 +03:00
qazal
af59b2eea9
tests from the indexing fusion branch ( #5886 )
2024-08-03 11:56:48 +03:00
chenyu
d5de44340e
UOp add mod folding ( #5862 )
...
* UOp add mod folding
* that passes now
2024-08-02 18:31:46 -04:00
chenyu
41bbd3f4c1
update UOp mod reduction patterns ( #5883 )
...
prepare generic mod folding, also some test changes from mod folding pr
2024-08-02 17:43:40 -04:00
wozeparrot
acadccf344
comma benchmark ( #5518 )
2024-08-02 14:36:54 -07:00
Elias Wahl
4a114756f6
New BERT dataloader ( #5881 )
...
* One file == One topic
* update test
* new dataloader
* update train script
* get index is faster
2024-08-02 15:12:23 -04:00
nimlgen
2777784b91
add dependency viewer to hcq profiler ( #5874 )
...
* hcq profiler support deps
* clean up
* cleaner
* cleanup
* revert this
* linter
* mypy
* add test
* sync is strange, need to take the end
* linter + test
2024-08-02 22:07:01 +03:00
George Hotz
23e8c39288
get program fields in __post_init__ [run_process_replay] ( #5878 )
...
* get program fields in __post_init__ [run_process_replay]
* remove print
2024-08-02 09:57:12 -07:00
qazal
8611fa6c99
apply opts.extra_matcher in process replay [run_process_replay] ( #5877 )
2024-08-02 18:07:58 +03:00
qazal
2a791f7924
fuzz uops is simpler with List[UOp] [run_process_replay] ( #5875 )
...
* remove from fuzz_uops
* update fuzz_uops.py
* add to realize.py
2024-08-02 17:28:15 +03:00
George Hotz
877e0b4ba0
define global only has the index [run_process_replay] ( #5869 )
...
* define global only has the index [run_process_replay]
* fix that linearizer test
* fix ptx
* stupid ptx fix
2024-08-01 19:01:15 -07:00
chenyu
f27f949a5d
Revert "revert some UOp IDIV bound ( #5863 )" ( #5871 )
...
This reverts commit 0c8d202348
.
2024-08-01 21:38:31 -04:00
chenyu
df138bc558
Revert "revert a mod pattern ( #5864 )" ( #5870 )
...
This reverts commit 5c8de2d044
.
2024-08-01 20:44:26 -04:00
chenyu
1b0314d9ef
Revert "remove one more UOp mod pattern ( #5865 )" ( #5868 )
...
This reverts commit b03b8e18c2
.
2024-08-01 20:28:35 -04:00
George Hotz
d73bc85ba9
UOpGraph not in renderer or Program [run_process_replay] ( #5867 )
...
* UOpGraph not in renderer or Program [run_process_replay]
* fix some tests
* fix ptx
2024-08-01 16:20:30 -07:00
chenyu
b392b8edc3
increase atol and rtol test_gemm_fp16 ( #5866 )
...
* increase atol and rtol test_gemm_fp16
made it pass with NOOPT which has larger accumulated error
* revert that
2024-08-01 19:09:58 -04:00
chenyu
b03b8e18c2
remove one more UOp mod pattern ( #5865 )
...
fixed UOP_IS_SYMBOLIC=1 test_failure_40
2024-08-01 18:29:04 -04:00
chenyu
5c8de2d044
revert a mod pattern ( #5864 )
...
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
George Hotz
2d3c7e4d4e
some TestPickleJIT tests ( #5860 )
...
* some TestPickleJIT tests
* hotfix: print which opencl device we are using
2024-08-01 12:39:59 -07:00
chenyu
0c8d202348
revert some UOp IDIV bound ( #5863 )
...
* revert some UOp IDIV bound
breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI
* those are correct
* skip slow ones
2024-08-01 15:09:06 -04:00
George Hotz
53fcac9e80
hotfix: increase time on flaky NV test
2024-08-01 10:20:07 -07:00
qazal
26d0265d66
test schedule of LazyBuffers [run_process_replay] ( #5859 )
2024-08-01 19:06:29 +03:00
David Hou
eb91423cb4
MLB support reshape for uneven shards ( #5804 )
...
* cleaner uneven reshape
* update test
2024-08-01 02:36:03 -07:00
David González Martínez
0f09b94c43
add failing test for second order derivatives ( #5772 )
...
* add failing test
* fix lint
* fix bad merge
* fix again
* fix test
* more minimal
2024-08-01 02:34:47 -07:00
George Hotz
9d05dfb6f4
move JIT graphing into CapturedJit ( #5852 )
...
* move JIT graphing into CapturedJit
* better
* _jit_cache
* clear inputs cleanup
* test_pickle_jit with graph + cleanup
* 0 is fine to start
* support None in bufs
* alloc real buffers
* cleaner
2024-07-31 20:48:17 -07:00
chenyu
0ec732b494
test lin fail 47 for UOP_IS_SYMBOLIC ( #5853 )
...
failed arange example with UOP_IS_SYMBOLIC
2024-07-31 23:09:22 -04:00
George Hotz
c6a8395f1b
CapturedJit is fun to pickle [run_process_replay] ( #5851 )
...
* CapturedJit is fun to pickle
* export input replace
2024-07-31 17:23:01 -07:00
George Hotz
72621d9e7c
count the specials in uops [run_process_replay] ( #5848 )
...
* count the specials in uops [run_process_replay]
* cleanups
2024-07-31 14:53:18 -07:00
chenyu
c2ffcf6887
remove the wrong mod UOp pattern ( #5847 )
...
don't think we are hitting it because the stride construction, and it's wrong and not needed
2024-07-31 16:24:25 -04:00
qazal
8174c438a3
pad test_failure_45 ( #5846 )
2024-07-31 23:08:48 +03:00
George Hotz
8672a9db3f
add test to validate lazyops dims ( #5845 )
2024-07-31 12:59:38 -07:00
chenyu
4fe5b95568
fix UOp ALU bound ( #5844 )
...
* fix UOp ALU bound
root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized
* it can be nan...
2024-07-31 15:19:31 -04:00
nimlgen
f768935be8
add RING_ALLREDUCE_THRESHOLD ( #5835 )
...
* add RING_ALLREDUCE_THRESHOLD
* becnhmark
* fixes
* fix n_gpus
* unused import
* remove debug=2
2024-07-31 16:13:09 +03:00
chenyu
2e087ca8e4
UOp bound for div negative number ( #5808 )
2024-07-31 02:10:23 -04:00
qazal
bcbd925001
hcopts failing test for fused arange kernel ( #5815 )
...
* add failure_43
* n 45
2024-07-31 09:02:44 +03:00
qazal
ed556c260e
UOps.IF rules more tests ( #5831 )
...
* init tests
* split tests
* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
David Hou
492a696d14
allow specify splits in shard, handle multiple different splits in MLB.e ( #5599 )
...
* allow specify splits in shard, handle multiple different splits in MLB.e
* line width
* linter
* don't use Device in docstring
* specify size of shards instead of boundaries
* adjust docstring for specify size of shards instead of boundaries
* don't allow splits on symbolic axis?
* just allow sint in splits_to_bounds
* add message for assert
* bounds instead of splits to save lines
* fix types
* reduce diff
* fix
* tuple
* golf :(
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-30 19:33:04 -07:00
chenyu
c3da458bc3
UOp if min==max folds to CONST ( #5828 )
...
* UOp if min==max folds to CONST
* fix test
2024-07-30 22:14:22 -04:00
George Hotz
e6879035a0
work to make GEMV fast ( #5824 )
...
* work to make GEMV fast
* half8 cast
* align struct
* fix amd
* float8 is a later problem
2024-07-30 17:41:40 -07:00
chenyu
02f0be03f2
tests on UOp div negative number and arange opts ( #5825 )
2024-07-30 20:06:57 -04:00
George Hotz
693990a346
swap src[2] and src[3] in load [run_process_replay] ( #5821 )
...
* swap src[2] and src[3] in load [run_process_replay]
* cleanups + bugfix
* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412
new style load/store folder ( #5784 )
...
* remove old index reorder
* new style folder
* works better
* dedup
* one failure
* this is fine now...
* expander_rewrite
* images broken, but all else should work
* cleanups
* make tests work with old
* fix images
* cleanups + bugfix
* minor fixes
* fix gated store folding
* flip gate_creator and expander
* fix gated store
* remove unneeded rules
* lines getting close
* line count good
2024-07-30 13:17:20 -07:00
qazal
03d866b84f
UOps.IF with rewrite rules ( #5812 )
...
* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
2024-07-30 20:50:56 +03:00
chenyu
defd89e8e0
unify negative shape creation to raise ValueError ( #5817 )
...
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a
Add check for negative dimension in view ( #5790 )
...
* add check for negative dimension in view
* add negative dim tests
* move check to tensor level
* fix error message
* move check to view create
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-30 13:26:27 -04:00
Francis Lata
ce61be16f1
clean up how preprocessed folder is defined ( #5813 )
2024-07-30 12:35:26 -04:00
qazal
5e827e51d2
add llama3 BEAM=2 failures to test_linearizer_failures ( #5553 )
...
* skips
* opts.device
* benchmarks
* add to test_linearizer_failures
* remove hardcoded ones
* linter
* skip cpu
2024-07-30 00:37:32 +03:00
samm393
573e0f9a48
remove float division from idiv in python_alu ( #5777 )
...
* removes float division from idiv in python_alu
* add test
* cleaner logic
* pass clang unsigned literals correctly
* suffix ULL instead of U
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-29 12:14:12 -04:00
samm393
2c94316bd2
ull literal support and test ( #5789 )
...
* ull literal support and test
* missing .numpy()
2024-07-29 11:50:49 -04:00
nimlgen
ab3839a80a
cleanup nv/cuda compilers ( #5767 )
...
* cleanup nv/cuda compilers
* destroy prog
* small test
* fix test
* nv ptx rewrite key
* jitlink free
* ptx is part of cuda
2024-07-29 13:50:03 +03:00
chenyu
e7a14f398e
more uop_symbolic tests for divmod pairs ( #5785 )
2024-07-28 21:27:06 -04:00
George Hotz
76d191ab94
move consts to end of add ( #5783 )
...
* move consts to end of add
* better
* fix infinite loop
2024-07-28 17:38:57 -07:00
chenyu
71a64d8252
UOps.MUL bound when one is negative ( #5781 )
...
* UOps.MUL bound when one is negative
also one more distribute_mul rule
* don't always expand
2024-07-28 19:02:47 -04:00
qazal
b775db6b60
high-level benchmark timing diff ( #5776 )
...
* high level timings
benchmark times
fix defs
* use the name map
* skip last task
2024-07-28 23:42:57 +03:00
chenyu
600a39771d
fix Tensor.arange if (stop-start) and step have different signs ( #5775 )
2024-07-28 14:34:10 -04:00
David González Martínez
d0fd84e617
feat: allow passing gradient to .backward() to compute vjp ( #5771 )
...
* feat: allow passing gradient to .backward() to compute vjp
* fix
* refactor
* fix trailing whitespace
2024-07-28 11:13:18 -07:00
qazal
e0e7293b0a
make process replay unique in retries [run_process_replay] ( #5773 )
2024-07-28 20:44:15 +03:00
qazal
95dda8dadf
more unmatching vectorize/gep asserts [run_process_replay] ( #5760 )
...
* merge vectorize/gep rules [run_process_replay]
* assert dtypes
* src=
* float2=(float4.x,float4.y)
2024-07-28 15:08:54 +08:00
chenyu
bfbd7c5461
more generic UOp mul mod folding ( #5765 )
2024-07-27 20:20:35 -04:00
chenyu
80c6475757
update test_uop_symbolic to test UOp min and max ( #5764 )
...
covers #5750 , #5748 , #5741
2024-07-27 19:53:21 -04:00
nimlgen
ed1d784077
test profiler timer sync across devs ( #5751 )
...
* test profiler timer sync across devs
* more correct
* typo
2024-07-27 16:47:37 +03:00
qazal
3e49d86c01
process replay diffs 3 things now ( #5731 )
...
* github api infra
* process replay is 3 parts now
* parse benchmarks
* add gh_token
* complete diff
* move process replay tests
* last successful run
* add tempdir
* skip master
2024-07-27 12:52:20 +03:00
qazal
57b4a8e98d
assert process replay asserts ( #5737 )
...
* assert process replay asserts
* one ci job is fine
* test: Revert "separate process replay main loop (#5734 )"
This reverts commit 94d578396f
.
* mac sed needs that
* Revert "test: Revert "separate process replay main loop (#5734 )""
This reverts commit e4ad7684d5472a64841a66b43bc1db7c9bbbf9e8.
* disable process replay capture
* save time
* amd is tiny
* send to /dev/null
2024-07-27 12:07:50 +03:00
George Hotz
f8972ace38
test flops (and allow wide ALU in UOps) [run_process_replay] ( #5749 )
...
* flops test in external_test_speed_theoretical.py
* test speed theo
* min SZMAX
* allow wide ALU for things that support it
* needed for mypy
2024-07-26 21:07:28 -07:00
George Hotz
2fde2d2914
hotfix: external_test_speed_theoretical works on 24GB
2024-07-26 18:41:52 -07:00
George Hotz
829262a5ee
add external_test_speed_theoretical
2024-07-26 17:45:22 -07:00
kormann
a5ede535ef
NOp field name [run_process_replay] ( #5742 )
...
* rm def name
* add field name
2024-07-26 18:45:59 -04:00
George Hotz
c50e374bb6
multiple locals + get_kernel_modifier + fix valid ( #5739 )
...
* multiple locals + get_kernel_modifier + fix valid
* fix test pattern matcher
2024-07-26 15:10:10 -07:00
chenyu
dc7483ee6f
UOp simple div folding ( #5740 )
...
made UOp.divides return the Optional[quotient] and used it for simple div folding
2024-07-26 17:14:32 -04:00
chenyu
671259417f
reuse UOp `__repr__` for NOp ( #5738 )
2024-07-26 16:59:55 -04:00
kormann
b0c1dba299
named UOp class "NOP" [run_process_replay] ( #5728 )
...
* NOP
* fix const + simplify compile
* rm VAR for NOOP
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-26 13:25:53 -07:00
George Hotz
4df46eac67
clean up tensor cores [run_process_replay] ( #5736 )
...
* clean up tensor cores [run_process_replay]
* remove tuple(wmma_sz), self.opts.device
* remove tls, leave DEVICE
2024-07-26 13:21:23 -07:00
qazal
94d578396f
separate process replay main loop ( #5734 )
...
* separate process replay main loop
* [run_process_replay]
* add kernel_changed
* test with [run_process_replay]
* revert temp [run_process_replay]
2024-07-26 21:43:08 +03:00
chenyu
a4e9ebc68a
update test_uop_symbolic ( #5733 )
...
enabled more passed tests
2024-07-26 13:46:09 -04:00
chenyu
2cc55a3095
UOp simple mul add div fold ( #5726 )
2024-07-25 22:00:30 -04:00
chenyu
5521b6d437
UOp simple mul-add-lt fold ( #5721 )
2024-07-25 20:49:38 -04:00
qazal
1b53207b4f
revert isolated dags scheduling ( #5724 )
2024-07-25 19:45:12 -04:00
chenyu
845b0d1c9d
UOp more generic div folding ( #5722 )
...
old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c`
new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`
2024-07-25 17:49:14 -04:00
chenyu
a82815262c
more test_pattern_matcher fixups ( #5714 )
2024-07-25 14:12:21 -04:00
chenyu
05e02ddfb3
fixup test_pattern_matcher ( #5712 )
2024-07-25 13:48:52 -04:00
qazal
9ceb3a3d1f
beautiful_mnist -4.3% kernels ( #5709 )
...
* add is_complete
* partially delete forced_realized
* p2
* start
* refactor to can_group
* remove steps
* _get_inputs is nicer
* fix the cache
* cache is dict now
* rename to group
2024-07-25 20:30:49 +03:00
kormann
1e2eac755d
Fix repr upat ( #5705 )
...
* test
* fix
* x fix
* simpler
* rm extra space
2024-07-25 12:05:48 -04:00
qazal
1c992de257
hotfix: compare_schedule defaults to false ( #5707 )
2024-07-25 17:08:28 +03:00