qazal
66dfd5e7bf
faster codegen process replay ( #4858 )
...
* faster codegen process replay
* use self.copy
* regenerate
* delete copy
* test a real error [run_process_replay]
* revert the error change
2024-06-07 16:20:57 +03:00
chenyu
dd5378378b
cleanup kernel simplify_merge_adjacent ( #4852 )
...
cleanup kernel simplify_merge_adjacent
2024-06-06 12:04:54 -04:00
nimlgen
47bfd7c2b7
fix sync of offset buffers in graphs ( #4850 )
...
* correctly sync offset buffers
* test
* style
* run less
* just use base
2024-06-06 16:09:45 +03:00
qazal
eeb5a7af39
refactor `linearize` to render_block, P1 ( #4839 )
...
* refactor to render_block
* move rendering the reduce to its own thing
* add todo and cleanups [run_process_replay]
* inplace update of idxs [run_process_replay]
2024-06-06 15:31:43 +03:00
George Hotz
b932ce0f1d
[run_process_replay] style: clean up UPat
2024-06-06 08:54:24 +02:00
chenyu
b42f49b506
minor cleanup of view _merge_dims ( #4849 )
2024-06-05 23:20:26 -04:00
nimlgen
1649c21ead
nv fix round of allocation sizes ( #4828 )
...
* fix round of allocation sizes
* comment on prefetch
* use huge pages
2024-06-06 00:21:56 +03:00
nimlgen
09bfb8c10a
nv sync program copies to other exection ( #4845 )
2024-06-05 23:34:33 +03:00
chenyu
99e7a1d5e9
support symbolic reshape with non-contiguous ( #4844 )
...
* support symbolic reshape with non-contiguous
pre-requisite for symbolic arange (make symbolic ones that can be folded).
* test cases
* typo
* shorter
2024-06-05 16:01:19 -04:00
chenyu
a352b6d9ce
symbolic Tensor.var ( #4843 )
...
taken from #4446 and add more tests
2024-06-05 12:55:54 -04:00
Nik
085c0bbf6b
add mlperf train subset of openimages ( #4841 )
2024-06-05 10:10:11 -04:00
Timmy
887643cf34
Multireduce atomic local load/store test ( #4786 )
...
* atomic load/store test
* tests for nested & unrolled
* check barriers
* linters
* cleaning up diff
* fix assert in _temp_create_multireduce_ast changes
* cleaning up the check for redundant barriers
* minor cleanups for the assert
* always seed randn, helps with debuggability
---------
Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-05 14:41:19 +03:00
George Hotz
3954f102aa
style: make __init__ first in Tensor class
2024-06-05 12:51:41 +02:00
Szymon Ożóg
273945df67
Regression tests for bitshift ( #4829 )
...
* Regression tests for bitshift
* Add test for bitshift not triggered
* Enable tests
2024-06-05 11:42:34 +02:00
Alec Chen
5ac30c29d8
Construct UOps patterns using UPat ( #4821 )
...
* Allow UPat pattern definitions
* Convert pattern matcher tests to UPat constructions
* Convert constant_folder patterns to upat constructions
* Convert assembly patterns to upat constructions
* [run_process_replay] Drop UPat.from_dict
2024-06-05 10:29:37 +02:00
Szymon Ożóg
e47277d18a
Disable for PTX as well ( #4838 )
...
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-06-05 10:37:59 +03:00
Francis Lam
890e7c12bb
test/external/verify_kernel: add support for single pickled kernel ( #4836 )
2024-06-04 18:59:21 -04:00
Elias Wahl
e576aca044
Disable dropout ( #4837 )
2024-06-04 18:57:26 -04:00
Elias Wahl
bb248a0dd1
Optional half matmul ( #4835 )
...
* half linear
* move weight cast back
* oops
* matmul dtype var
* todo comment
2024-06-04 17:53:41 -04:00
Elias Wahl
04e237328b
Refactor to class style ( #4804 )
2024-06-04 14:08:31 -07:00
nimlgen
1b8bed4a26
nv check cmdq overrun ( #4824 )
...
* nv check cmdq overrun
* fix assert
2024-06-04 23:22:58 +03:00
David Hou
cddce0e168
don't cast before view on shape changing bitcast ( #4833 )
...
* don't cast before view on shape changing bitcast
* make sure cast before view triggers
2024-06-04 16:04:52 -04:00
Alec Chen
0c3a996e64
Nest ifs for dtype and uop in pattern matcher ( #4834 )
2024-06-04 15:51:28 -04:00
Alec Chen
4909a0d16f
Fix arg set in pattern matcher ( #4830 )
2024-06-04 15:10:09 -04:00
Alec Chen
c96026ac65
Add arg set regression test for pattern matcher ( #4827 )
...
* Add arg set regression test for pattern matcher
* real regression
---------
Co-authored-by: qazalin <qazal.software@gmail.com>
2024-06-04 13:35:09 -04:00
chenyu
a70e8a80d7
test_ops test cmp with special floats ( #4826 )
...
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
Szymon Ożóg
b6895dabaa
Remove ssa label ( #4823 )
...
* remove ssa label
* linting
2024-06-04 16:51:05 +02:00
George Hotz
052c928d06
hotfix: touchups from presentation
2024-06-04 16:31:03 +02:00
chenyu
1e02b4cae1
default skip all exception in beam ( #4822 )
...
added a flag `BEAM_STRICT_MODE` to catch compile error or other exceptions on demand
2024-06-03 18:21:36 -04:00
chenyu
3afc914617
CMPEQ -> CMPNE and make it safe to pad ( #4818 )
...
* CMPNE
* new dataset
2024-06-03 18:02:15 -04:00
qazal
79c7d402ee
improve augmented assign error message ( #4813 )
2024-06-03 16:57:22 -04:00
Szymon Ożóg
bb7b031c5c
Bitshift ( #4728 )
...
* WIP
* Cleanup
* Cleanup
* Fix variable, refactor to use set
* right shift should be signed/unsigned
* Test for bitshifts
* Allow a neg
2024-06-03 21:16:01 +02:00
nimlgen
e78a9bf3f2
support view in nv/amd ( #4812 )
...
* support view in nv/amd
* fix amd
* fix
* run test on nv/amd
2024-06-03 22:11:52 +03:00
chenyu
45083ccb43
canonicalize 0 in shape in View.create ( #4815 )
...
set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.
2024-06-03 13:37:37 -04:00
Szymon Ożóg
d064bf6d8c
b2 is useless ( #4814 )
2024-06-03 18:29:53 +02:00
nimlgen
65f0071c4b
amd compute queue bind api ( #4732 )
...
* amd hcq bind api
* revert copy queue
* revert
2024-06-03 18:36:56 +03:00
chenyu
3cc6ae0d85
layernorm backward is indepedent of its mean ( #4806 )
2024-06-03 09:49:59 -04:00
George Hotz
2dae657415
improve readability ( #4809 )
2024-06-03 14:57:57 +02:00
George Hotz
eecfdd2f6e
hotfix: fix dataset reading for new llm.c
2024-06-03 14:10:05 +02:00
qazal
6e0c16dfb0
cleanup render_reduceop ( #4807 )
...
* update acc key
* refactor return type
* remove return type
* run all reduces
* set acc key [run_process_replay]
* local_idxs are copied in render_reduceop [run_process_replay]
2024-06-03 14:39:02 +03:00
George Hotz
dd84f7d35e
touchup: show process name in multiprocess assert
2024-06-03 13:09:40 +02:00
qazal
0db9674dea
skip process replay on master ( #4808 )
2024-06-03 12:29:28 +03:00
qazal
f64fa51a64
process replay for test/* ( #4799 )
...
* add input to unit tests [run_process_replay]
* add setup [run_process_replay]
* run tests [run_process_replay]
* add cuda and amd [run_process_replay]
* run everything but BEAM=2 [run_process_replay]
* skip export_model [run_process_replay]
* fix amd CI
* add concurrency back
2024-06-03 12:01:58 +03:00
nimlgen
e8b5f2040d
nv faster signal on dma queue ( #4789 )
2024-06-02 21:47:24 +03:00
Francis Lata
707099487a
Multiprocessing UNet3D dataloader ( #4801 )
...
* testing dataloader
* matching dataloader implementation for unet3d
* remove comments
* clean up dataloader
* add cookie and cleanup
* use shm_path when creating SharedMemory
* add support for testing resnet and unet3d dataloaders
* update dataset test to return preprocesed data directory in prep for dataloader testing
* pass preprocessed dataset directory properly
* update loader function for dataloader
* add shuffling on indices
* update shm name
* more cleanup for unet3d dataloader
* remove changes to tests
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-02 11:30:47 -04:00
Timmy
ca32921f84
Multireduce PADTO Test ( #4785 )
...
* padto test
* expanded multireduce padto tests
* cuda doesnt run on ci
* moving padto_where_multireduce test to SUM so that we can check the reduce axis
* cleaning up tests some more
* add wanna_outputs
* refactor test_padto_sum_multireduce
* fix max and refactor where
* fix axis
---------
Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-02 13:46:53 +03:00
qazal
231ed2c656
compute aliased buffer idxs pre reduce ( #4788 )
2024-06-01 16:46:52 -04:00
nimlgen
1b18ebb133
minor cleanups ( #4802 )
2024-06-01 20:11:43 +03:00
chenyu
1ffa5ec492
unit test ShapeTracker.consecutive ( #4800 )
2024-06-01 10:10:51 -04:00
nimlgen
7384ee08a0
amd cleanup sdma ( #4796 )
...
* amd cleanup sdma
* faster enqueue for sdma
* typo
* remove commnted lines
* fix overrun check
* flushhdp better command
2024-06-01 17:06:44 +03:00