Commit Graph

98 Commits

Author SHA1 Message Date
chenyu 0ec732b494
test lin fail 47 for UOP_IS_SYMBOLIC (#5853)
failed arange example with UOP_IS_SYMBOLIC
2024-07-31 23:09:22 -04:00
qazal 8174c438a3
pad test_failure_45 (#5846) 2024-07-31 23:08:48 +03:00
George Hotz 8672a9db3f
add test to validate lazyops dims (#5845) 2024-07-31 12:59:38 -07:00
chenyu 4fe5b95568
fix UOp ALU bound (#5844)
* fix UOp ALU bound

root cause of resnet bug, the ALU bound is only correct for scalar, not vectorized

* it can be nan...
2024-07-31 15:19:31 -04:00
qazal bcbd925001
hcopts failing test for fused arange kernel (#5815)
* add failure_43

* n 45
2024-07-31 09:02:44 +03:00
qazal ed556c260e
UOps.IF rules more tests (#5831)
* init tests

* split tests

* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
qazal 03d866b84f
UOps.IF with rewrite rules (#5812)
* expand merge

* merge barriers

* gate_folder

* test_linearizer_failures

* this can be here

* bring the new repr back

* gate_folder2

* gate_creator is better

* gate_folder

* dedup conditions

* early gate folding

* dedup barrier

* fold noop conditions

* all consts can go away

* free lines
2024-07-30 20:50:56 +03:00
qazal 5e827e51d2
add llama3 BEAM=2 failures to test_linearizer_failures (#5553)
* skips

* opts.device

* benchmarks

* add to test_linearizer_failures

* remove hardcoded ones

* linter

* skip cpu
2024-07-30 00:37:32 +03:00
nimlgen 08f47d7dc3
more info on failure 41 (#5704) 2024-07-25 12:14:28 +03:00
nimlgen 69d4f474d8
amd resnet pf (#5703) 2024-07-25 11:21:22 +03:00
chenyu 3060e0be4f
add vmin vmax of SPECIAL (#5670)
* add vmin vmax of SPECIAL

folded stuff like (-1 < gidx0)

* flaky
2024-07-23 22:55:54 -04:00
chenyu 01fe00e055
skip test_failure_39 in CI (#5660)
took more than 2 minutes in ci metal, it's basically the same as test_failure_37 but 20X bigger
2024-07-23 14:47:05 -04:00
George Hotz 386fb5e7f8
folding without UNMUL (#5628)
* folding without UNMUL

* fix failures, index_collapse

* import ReduceOps

* test_arange_4096 isn't folding
2024-07-21 20:14:44 -07:00
George Hotz fa7e734b49
MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
Francis Lam c4eb30a04c
test/test_linearizer_failures: add a new beautiful_mnist one (#5531)
* test/test_linearizer_failures: add a new beautiful_mnist one

this one is from a DEPTH=2 fuzz_linearizer search

* add GPU to test_failure_40

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-17 16:27:04 -04:00
George Hotz 1a68854766
PatternMatcher add (#5532)
* PatternMatcher add [run_process_replay]

* f4 dynamic

* test_failure_36 is fixed

* fix PTX
2024-07-17 12:44:42 -07:00
George Hotz 158221b36b
expand tests from uop_expander [run_process_replay] (#5524)
* expand tests from uop_expander

* more changes from the branch
2024-07-17 09:22:36 -07:00
George Hotz 42c25cc961
fix fixup_ast (#5523)
* fix fixup_ast

* these lin failures are fixed
2024-07-17 08:52:21 -07:00
Francis Lam 2d53abb04a
test/external/fuzz_linearizer: fix for new AST changes (#5519)
* test/external/fuzz_linearizer: fix for new AST changes

also add beautiful_mnist failures

* add CLANG and LLVM to test_failure_35 failed_platforms

* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
chenyu 07ff4b7d24
test_failure_33 ast that has UOps.UNMUL after linearize (#5504)
* test_failure_33 ast that has UOps.UNMUL after linearize

* smaller
2024-07-15 22:54:23 -04:00
George Hotz 03c2dc8bd7
lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00
George Hotz 870dc8c350
s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
qazal c4fdb9c725
second iteration on verify_lazyop (#5140) 2024-06-25 09:44:32 +03:00
qazal 18e70deec3
verify_lazyop (#5124)
* start verify_lazyop

* bfs order

* assert

* assert shapetrackers 2

* refactor

* more iteration

* skips

* that ast was wrong too
2024-06-24 13:45:35 -07:00
Francis Lam 8d33998e0d
[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max (#4855)
* linearizer: fix get_grouping_dims to respect global/local max

* fix lidx variable index offset and unrestrict clang/llvm global len

* test reverse variable indexing when reverse_dims is true

* change the collapse axis to be the right most if reversed
2024-06-18 16:51:27 +03:00
Junjun Dong c8cd6e725c
Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977)
* feat: remove BinaryOps.SUB

* remove SUB in test_early_end_local

* regenerate dataset. remove SUB in test_linearizer_*

* reenable overflow tests

* simplify tensor.sub function by returning a+(-b)

* remove whitespaces

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-18 09:06:13 -04:00
Jhenner Tigreros dc9e9e4363
Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887)
* Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV

* Delete unused import

* Add cstyle renderer

* Fix formatting text

* Fix test error due to bad implementation of renderer

* Add PTX support

* Add RECIP to LLVMIR

* Remove BinaryOps.DIV from symbolic test

* Change some test and fix C floor division

* Change references to DIV for the RECIP or IDIV

* Add mimic idiv for symbolic test

* Restore floor

* Mimic idiv

* cast to int

* Fix some test and renderer

* Remove DIV for render nodes

* Resolve issue with div

* Add TestRenderer

* Fix test

* fix error

* Fix PAD test

* Fix div implementation

* Remove DIV

* Add upcast to rshift, due to use of MUL and RECIP on DIV

* Fix linter

* Remove complete BinaryOps.DIV

* Fix lint

* Fix some test

* Revert mul modification

* Fix tests

* Fix CLANG for uops

* Revert IDIV function

* Minor fix

* modify pattern matching rule to support nan

* Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP

* Remove const folding for IDIV and fix PTX

* Complete remove IDIV from extra

* Remove test_div from TestFloatUOps due to test on recip

* Fix linearizer

* fix

* Fix test_22

* Fix llvm

* Apply trunc function for llvmlit

* use floor instead of trunc

* Use correct type

* Generate new fuzz db

* Fix rshift, do not cast to float to support idiv

* Return upcast=false to rshift

* Add to unsafepad BinaryOps.IDIV

* Remove RECIP override for CUDA

* add atol / rtol for the test

* Remove cast to int on IDIV

* Regenerate sops

* delete sops.gz

* regenerate

* regenerate

* regenerate

* Reduce margins

* pass atol and rtol as parametersg for _test_metrics

* regenerated dataset

* Regenerate

* Remove duplicated

* Revert changes on extra

* Remove changes extra and NOQA for test

* Remove E501

* Remove and change line

* Remove E501

* Fix atan2

* Revert import and E501

* Remove E501

* Add hrcp to halp ops

* Remove 1 of hrcp

* Remove last DIV and add type check on uops for IDIV

* Fix new tests

* Fix tests and custom function

* Regenerate dataset

* Regenerate dataset

* Revert dataset

* Change generate dataset script

* Remove line

* Change IDIV, type checker validate if x,y and z are int

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-06-14 02:43:46 -07:00
chenyu a21ea165bc
skip linearizer test_failure_22 on llvm (#4937)
getting flaky recently
2024-06-12 16:03:38 -04:00
Timmy 720c700a8a
Multireduce-Kernels: Linearizer Changes and Tests (#4259)
* basic tests

* cleanup

* pylint

* ruff

* use define acc as a proxy for rendered reductions

* use define acc as a proxy for rendered reductions

* recursive reduceop rendering via ast_parse

* linters + cleanup

* fixing late buf loading

* plus linters

* removing extra line

* linters

* does this break ci?

* added tests and if add end change

* typo in add_ends

* linters

* removing comments

* allow endifs to be inserted before the end of the graph

* find add ENDIF before next BARRIER

* removing tests with manual ENDIF + linters

* specifically the next barrier aftr the store of the local result

* Revert "specifically the next barrier aftr the store of the local result"

This reverts commit b288a5c3cec4114480cdb835a8d0ad01aac49519.

* keeping up to date

* linters + merge changes

* cleaning up old bad decisions

* linters and opts

* mrged linearizer tests

* fixing merge issues

* removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions

* small diff fixes

* updating linearizer to work without uops.add( ... cachable)

* linters

* comment in multireduce tests

* skipping tests without locals

* full tests

* linters

* load_cache[key] fix for multiple accs

* linters

* assert only one reduceop

* fix loop_scope test to actually cause an issue

* self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique

* updated tests

* fixing merge

* removing debug prints

* complete merge fix

* linters

* diff cleanup

* adding tests in

* give each reduce it's own local buffer

* gpu=1 changes

* store and load locals with upcasting

* modifying test?

* make multireduce_netsted_local_upcast test match single reduce shapes

* removing todo

* cleaning up the diff

* unroll test

* unroll and upcast tests

* fix gpu

* seq and self.load_cache[key] cleaning

* linters

* padto works

* merge fixes

* fixes

* add skips for amd

* linters + seq

* cleaning & more tests

* softmax tests

* linters

* [run_process_replay]

* add new tests back

This reverts commit 19dec22e0178bca711719cee3e79f327c9e69c12.

* more hardcoded -1s

* fix ptx

* Fix name for loop in ptx

* cleaning up the diff

* cleaning up the uops diff

* nv ci is too slow

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-06-12 13:29:43 -04:00
chenyu 0f21aa0416
example kernel that triggers Memory access fault for resnet on red (#4678) 2024-05-21 18:59:36 -04:00
George Hotz 07b350a8f4
new uops is an actual graph (#4560)
* new uops is an actual graph

* it's way slower

* simpler

* fix define acc

* render_loop unique

* ops test pass

* add pattern matcher back, there's bugs

* rewrite

* use priority queue

* recursive children

* fix tests

* fix tests with SINK

* fix abstractions

* fix assembly

* simpler

* link define_acc

* fix DEFINE_ACC placement

* type verify

* full cmp

* fix cmp

* ACCESS_ACC

* insert DEFINE_ACC

* fix PHI

* recursive rewrite

* fix many tests

* sum collapse

* more patterns

* correct change

* fold arange

* fix that lin test

* space

* big folding rule works

* close

* has more maxes, meh

* cached node replace

* set changed

* simplest folding yet

* works

* works

* DIV

* all tests pass

* del

* fuzz linearizer fails

* sum_collapse

* test depth 2 cf

* fix lin test 14

* fix clang depth

* disable that

* failure 14 is fixed

* fix ptx

* failure 27 is fixed

* fix llama

* run_cnt

* Revert "Optimize PTX gated loads index calculation (#4304)"

This reverts commit d97d5a7689.

* fix uops loop

* fix ptx bugs

* add barrier

* print

* mem_type in ptx direct

* bypass tests that fail in CI but pass locally

* ptx remove ptr_ar

* more ptx passing

* fix ptx tests

* assert compile support

* remove  model inference benchmark from red
2024-05-17 18:00:18 -07:00
uuuvn 639ea5b0f2
Metal linearizer failure 22 is flaky not just on CI (#4617)
* METAL doesn't fail anymore, not just on CI

* oops
2024-05-16 11:31:23 -04:00
nimlgen eb9689336e
nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
Ahmed Harmouche 662bca8134
Split UnaryOps.CAST into CAST and BITCAST (#4487)
* Separate cast and bitcast

* Fix lint

* No more arg[0]

* Revert "No more arg[0]"

This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964.

* CAST/BITCAST arg is the dtype only, no more tuple

* No image bitcast, regenerate dataset

* Small fixes
2024-05-15 11:43:31 -04:00
George Hotz ff64bcab69
move graph/search to engine (#4596) 2024-05-14 23:12:59 -07:00
chenyu afe020710d
disable PADTO on upcasted axis (#4444)
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam c8595a9655
update sops.gz, fix tests and add new linearizer test (#4437)
* update sops.gz, fix tests and add new linearizer test

* remove METAL CI skip for test_failure_22

* re-add skip to METAL CI to test_failure_22
2024-05-05 17:31:25 -04:00
chenyu 3f3af0fb85
test_linearizer_failures 29 passes now (#4215)
TC + PADTO fixed
2024-04-18 19:49:23 -04:00
chenyu 1fa0351acb
fix DEFINE_ACC invalid_value to have same type as localtype (#3980) 2024-03-28 19:21:17 -04:00
Patrick Tsai e27129a798
Fix linearizer failure 26 test (#3906)
* Adjust adds between WHERE and PHI

* Not much better

* undo recursive change

* hm

* iterate over where, not factored op

* oo

* consts only for loop

* UNdo var name change

* update

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-03-24 16:34:13 -04:00
Francis Lam 0145366323
wmma: fix the AMD TC threads to split the first 16 threads (#3904)
previously it was incorrectly aliasing 16 into the size 8 upcast
on the store alias.  now it splits it properly into 8 and the
remaining 2 into the correct local stride
2024-03-23 21:17:42 -04:00
chenyu a2b2597fc2
replace dtype.name str with render_dtype (#3903)
fixed some bf16 cast issue since it does not have `.name`.
also more robust if there are lang specific type override
2024-03-23 19:25:48 -04:00
chenyu 30fa03243e
reuse fuzz_linearizer.compare_linearizer in test_linearizer_failures (#3861) 2024-03-21 14:12:27 -04:00
chenyu 33dd99acf4
remove helper_add_store from test_linearizer_failures (#3860) 2024-03-21 12:53:31 -04:00
Francis Lam 131bbb6563
test_linearizer_failure: add failure 27 from a gpt2 kernel (#3825)
* test_linearizer_failure: add failure 27 from a gpt2 kernel

found during a full fuzz test of applied_opts combos to a
depth of 4 on the gpt2 kernels w/o GROUPTOP.

added additional examples to failure 26 that don't have GROUPTOP

* add other platform failure
2024-03-19 16:29:50 -04:00
Francis Lam 9851e2c3b9
test_linearizer_failure: add failure 26 from a gpt2 kernel (#3821)
found during a full fuzz test of all applied_opts combos to a
depth of 3 on the gpt2 kernels
2024-03-19 13:19:54 -04:00
chenyu ac866eaf5a
disable simplify_phi_loops (#3812)
* disble simplify_phi_loops

this breaks BEAM search GPT2.

* skip that
2024-03-18 19:25:26 -04:00
Francis Lam a7afd2f6bf
test_linearizer_failures: add failing kernel from GPT2 CUDA (#3808)
* test_linearizer_failures: add failing kernel from GPT2 CUDA

* test_linearizer_failure: remove "HIP" from failed_platforms
2024-03-18 17:16:40 -04:00
qazal e3e89c244b
multioutput uoping infra (#3706)
* linearize multioutput

* add vars to copy
2024-03-15 21:56:59 -07:00
chenyu a2d3cf64a5
move is_dtype_supported to test.helpers (#3762)
* move is_dtype_supported to test.helpers

updated all places that check if float16 is supports

* fix tests
2024-03-15 14:33:26 -04:00