Commit Graph

4685 Commits

Author SHA1 Message Date
qazal 66dfd5e7bf
faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
chenyu dd5378378b
cleanup kernel simplify_merge_adjacent (#4852)
cleanup kernel simplify_merge_adjacent
2024-06-06 12:04:54 -04:00
nimlgen 47bfd7c2b7
fix sync of offset buffers in graphs (#4850)
* correctly sync offset buffers

* test

* style

* run less

* just use base
2024-06-06 16:09:45 +03:00
qazal eeb5a7af39
refactor `linearize` to render_block, P1 (#4839)
* refactor to render_block

* move rendering the reduce to its own thing

* add todo and cleanups [run_process_replay]

* inplace update of idxs [run_process_replay]
2024-06-06 15:31:43 +03:00
George Hotz b932ce0f1d [run_process_replay] style: clean up UPat 2024-06-06 08:54:24 +02:00
chenyu b42f49b506
minor cleanup of view _merge_dims (#4849) 2024-06-05 23:20:26 -04:00
nimlgen 1649c21ead
nv fix round of allocation sizes (#4828)
* fix round of allocation sizes

* comment on prefetch

* use huge pages
2024-06-06 00:21:56 +03:00
nimlgen 09bfb8c10a
nv sync program copies to other exection (#4845) 2024-06-05 23:34:33 +03:00
chenyu 99e7a1d5e9
support symbolic reshape with non-contiguous (#4844)
* support symbolic reshape with non-contiguous

pre-requisite for symbolic arange (make symbolic ones that can be folded).

* test cases

* typo

* shorter
2024-06-05 16:01:19 -04:00
chenyu a352b6d9ce
symbolic Tensor.var (#4843)
taken from #4446 and add more tests
2024-06-05 12:55:54 -04:00
Nik 085c0bbf6b
add mlperf train subset of openimages (#4841) 2024-06-05 10:10:11 -04:00
Timmy 887643cf34
Multireduce atomic local load/store test (#4786)
* atomic load/store test

* tests for nested & unrolled

* check barriers

* linters

* cleaning up diff

* fix assert in _temp_create_multireduce_ast changes

* cleaning up the check for redundant barriers

* minor cleanups for the assert

* always seed randn, helps with debuggability

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-05 14:41:19 +03:00
George Hotz 3954f102aa style: make __init__ first in Tensor class 2024-06-05 12:51:41 +02:00
Szymon Ożóg 273945df67
Regression tests for bitshift (#4829)
* Regression tests for bitshift

* Add test for bitshift not triggered

* Enable tests
2024-06-05 11:42:34 +02:00
Alec Chen 5ac30c29d8
Construct UOps patterns using UPat (#4821)
* Allow UPat pattern definitions

* Convert pattern matcher tests to UPat constructions

* Convert constant_folder patterns to upat constructions

* Convert assembly patterns to upat constructions

* [run_process_replay] Drop UPat.from_dict
2024-06-05 10:29:37 +02:00
Szymon Ożóg e47277d18a
Disable for PTX as well (#4838)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-06-05 10:37:59 +03:00
Francis Lam 890e7c12bb
test/external/verify_kernel: add support for single pickled kernel (#4836) 2024-06-04 18:59:21 -04:00
Elias Wahl e576aca044
Disable dropout (#4837) 2024-06-04 18:57:26 -04:00
Elias Wahl bb248a0dd1
Optional half matmul (#4835)
* half linear

* move weight cast back

* oops

* matmul dtype var

* todo comment
2024-06-04 17:53:41 -04:00
Elias Wahl 04e237328b
Refactor to class style (#4804) 2024-06-04 14:08:31 -07:00
nimlgen 1b8bed4a26
nv check cmdq overrun (#4824)
* nv check cmdq overrun

* fix assert
2024-06-04 23:22:58 +03:00
David Hou cddce0e168
don't cast before view on shape changing bitcast (#4833)
* don't cast before view on shape changing bitcast

* make sure cast before view triggers
2024-06-04 16:04:52 -04:00
Alec Chen 0c3a996e64
Nest ifs for dtype and uop in pattern matcher (#4834) 2024-06-04 15:51:28 -04:00
Alec Chen 4909a0d16f
Fix arg set in pattern matcher (#4830) 2024-06-04 15:10:09 -04:00
Alec Chen c96026ac65
Add arg set regression test for pattern matcher (#4827)
* Add arg set regression test for pattern matcher

* real regression

---------

Co-authored-by: qazalin <qazal.software@gmail.com>
2024-06-04 13:35:09 -04:00
chenyu a70e8a80d7
test_ops test cmp with special floats (#4826)
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
Szymon Ożóg b6895dabaa
Remove ssa label (#4823)
* remove ssa label

* linting
2024-06-04 16:51:05 +02:00
George Hotz 052c928d06 hotfix: touchups from presentation 2024-06-04 16:31:03 +02:00
chenyu 1e02b4cae1
default skip all exception in beam (#4822)
added a flag `BEAM_STRICT_MODE` to catch compile error or other exceptions on demand
2024-06-03 18:21:36 -04:00
chenyu 3afc914617
CMPEQ -> CMPNE and make it safe to pad (#4818)
* CMPNE

* new dataset
2024-06-03 18:02:15 -04:00
qazal 79c7d402ee
improve augmented assign error message (#4813) 2024-06-03 16:57:22 -04:00
Szymon Ożóg bb7b031c5c
Bitshift (#4728)
* WIP

* Cleanup

* Cleanup

* Fix variable, refactor to use set

* right shift should be signed/unsigned

* Test for bitshifts

* Allow a neg
2024-06-03 21:16:01 +02:00
nimlgen e78a9bf3f2
support view in nv/amd (#4812)
* support view in nv/amd

* fix amd

* fix

* run test on nv/amd
2024-06-03 22:11:52 +03:00
chenyu 45083ccb43
canonicalize 0 in shape in View.create (#4815)
set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.
2024-06-03 13:37:37 -04:00
Szymon Ożóg d064bf6d8c
b2 is useless (#4814) 2024-06-03 18:29:53 +02:00
nimlgen 65f0071c4b
amd compute queue bind api (#4732)
* amd hcq bind api

* revert copy queue

* revert
2024-06-03 18:36:56 +03:00
chenyu 3cc6ae0d85
layernorm backward is indepedent of its mean (#4806) 2024-06-03 09:49:59 -04:00
George Hotz 2dae657415
improve readability (#4809) 2024-06-03 14:57:57 +02:00
George Hotz eecfdd2f6e hotfix: fix dataset reading for new llm.c 2024-06-03 14:10:05 +02:00
qazal 6e0c16dfb0
cleanup render_reduceop (#4807)
* update acc key

* refactor return type

* remove return type

* run all reduces

* set acc key [run_process_replay]

* local_idxs are copied in render_reduceop [run_process_replay]
2024-06-03 14:39:02 +03:00
George Hotz dd84f7d35e touchup: show process name in multiprocess assert 2024-06-03 13:09:40 +02:00
qazal 0db9674dea
skip process replay on master (#4808) 2024-06-03 12:29:28 +03:00
qazal f64fa51a64
process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
nimlgen e8b5f2040d
nv faster signal on dma queue (#4789) 2024-06-02 21:47:24 +03:00
Francis Lata 707099487a
Multiprocessing UNet3D dataloader (#4801)
* testing dataloader

* matching dataloader implementation for unet3d

* remove comments

* clean up dataloader

* add cookie and cleanup

* use shm_path when creating SharedMemory

* add support for testing resnet and unet3d dataloaders

* update dataset test to return preprocesed data directory in prep for dataloader testing

* pass preprocessed dataset directory properly

* update loader function for dataloader

* add shuffling on indices

* update shm name

* more cleanup for unet3d dataloader

* remove changes to tests

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-02 11:30:47 -04:00
Timmy ca32921f84
Multireduce PADTO Test (#4785)
* padto test

* expanded multireduce padto tests

* cuda doesnt run on ci

* moving padto_where_multireduce test to SUM so that we can check the reduce axis

* cleaning up tests some more

* add wanna_outputs

* refactor test_padto_sum_multireduce

* fix max and refactor where

* fix axis

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-02 13:46:53 +03:00
qazal 231ed2c656
compute aliased buffer idxs pre reduce (#4788) 2024-06-01 16:46:52 -04:00
nimlgen 1b18ebb133
minor cleanups (#4802) 2024-06-01 20:11:43 +03:00
chenyu 1ffa5ec492
unit test ShapeTracker.consecutive (#4800) 2024-06-01 10:10:51 -04:00
nimlgen 7384ee08a0
amd cleanup sdma (#4796)
* amd cleanup sdma

* faster enqueue for sdma

* typo

* remove commnted lines

* fix overrun check

* flushhdp better command
2024-06-01 17:06:44 +03:00