* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
* test/test_linearizer_failures: add a new beautiful_mnist one
this one is from a DEPTH=2 fuzz_linearizer search
* add GPU to test_failure_40
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
* linearizer: fix get_grouping_dims to respect global/local max
* fix lidx variable index offset and unrestrict clang/llvm global len
* test reverse variable indexing when reverse_dims is true
* change the collapse axis to be the right most if reversed
* Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV
* Delete unused import
* Add cstyle renderer
* Fix formatting text
* Fix test error due to bad implementation of renderer
* Add PTX support
* Add RECIP to LLVMIR
* Remove BinaryOps.DIV from symbolic test
* Change some test and fix C floor division
* Change references to DIV for the RECIP or IDIV
* Add mimic idiv for symbolic test
* Restore floor
* Mimic idiv
* cast to int
* Fix some test and renderer
* Remove DIV for render nodes
* Resolve issue with div
* Add TestRenderer
* Fix test
* fix error
* Fix PAD test
* Fix div implementation
* Remove DIV
* Add upcast to rshift, due to use of MUL and RECIP on DIV
* Fix linter
* Remove complete BinaryOps.DIV
* Fix lint
* Fix some test
* Revert mul modification
* Fix tests
* Fix CLANG for uops
* Revert IDIV function
* Minor fix
* modify pattern matching rule to support nan
* Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP
* Remove const folding for IDIV and fix PTX
* Complete remove IDIV from extra
* Remove test_div from TestFloatUOps due to test on recip
* Fix linearizer
* fix
* Fix test_22
* Fix llvm
* Apply trunc function for llvmlit
* use floor instead of trunc
* Use correct type
* Generate new fuzz db
* Fix rshift, do not cast to float to support idiv
* Return upcast=false to rshift
* Add to unsafepad BinaryOps.IDIV
* Remove RECIP override for CUDA
* add atol / rtol for the test
* Remove cast to int on IDIV
* Regenerate sops
* delete sops.gz
* regenerate
* regenerate
* regenerate
* Reduce margins
* pass atol and rtol as parametersg for _test_metrics
* regenerated dataset
* Regenerate
* Remove duplicated
* Revert changes on extra
* Remove changes extra and NOQA for test
* Remove E501
* Remove and change line
* Remove E501
* Fix atan2
* Revert import and E501
* Remove E501
* Add hrcp to halp ops
* Remove 1 of hrcp
* Remove last DIV and add type check on uops for IDIV
* Fix new tests
* Fix tests and custom function
* Regenerate dataset
* Regenerate dataset
* Revert dataset
* Change generate dataset script
* Remove line
* Change IDIV, type checker validate if x,y and z are int
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* basic tests
* cleanup
* pylint
* ruff
* use define acc as a proxy for rendered reductions
* use define acc as a proxy for rendered reductions
* recursive reduceop rendering via ast_parse
* linters + cleanup
* fixing late buf loading
* plus linters
* removing extra line
* linters
* does this break ci?
* added tests and if add end change
* typo in add_ends
* linters
* removing comments
* allow endifs to be inserted before the end of the graph
* find add ENDIF before next BARRIER
* removing tests with manual ENDIF + linters
* specifically the next barrier aftr the store of the local result
* Revert "specifically the next barrier aftr the store of the local result"
This reverts commit b288a5c3cec4114480cdb835a8d0ad01aac49519.
* keeping up to date
* linters + merge changes
* cleaning up old bad decisions
* linters and opts
* mrged linearizer tests
* fixing merge issues
* removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions
* small diff fixes
* updating linearizer to work without uops.add( ... cachable)
* linters
* comment in multireduce tests
* skipping tests without locals
* full tests
* linters
* load_cache[key] fix for multiple accs
* linters
* assert only one reduceop
* fix loop_scope test to actually cause an issue
* self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique
* updated tests
* fixing merge
* removing debug prints
* complete merge fix
* linters
* diff cleanup
* adding tests in
* give each reduce it's own local buffer
* gpu=1 changes
* store and load locals with upcasting
* modifying test?
* make multireduce_netsted_local_upcast test match single reduce shapes
* removing todo
* cleaning up the diff
* unroll test
* unroll and upcast tests
* fix gpu
* seq and self.load_cache[key] cleaning
* linters
* padto works
* merge fixes
* fixes
* add skips for amd
* linters + seq
* cleaning & more tests
* softmax tests
* linters
* [run_process_replay]
* add new tests back
This reverts commit 19dec22e0178bca711719cee3e79f327c9e69c12.
* more hardcoded -1s
* fix ptx
* Fix name for loop in ptx
* cleaning up the diff
* cleaning up the uops diff
* nv ci is too slow
---------
Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
* mockgpu nv
* works
* comment that out
* fix merge
* setup gpuocelot
* install packages
* not run all of them
* passes
* fix ci
* almost
* should pass
* linter
* linter 2
* try this?
* ugn, not supported
* ci
* remove ticket from description
* better descs
* Separate cast and bitcast
* Fix lint
* No more arg[0]
* Revert "No more arg[0]"
This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964.
* CAST/BITCAST arg is the dtype only, no more tuple
* No image bitcast, regenerate dataset
* Small fixes
* Adjust adds between WHERE and PHI
* Not much better
* undo recursive change
* hm
* iterate over where, not factored op
* oo
* consts only for loop
* UNdo var name change
* update
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
previously it was incorrectly aliasing 16 into the size 8 upcast
on the store alias. now it splits it properly into 8 and the
remaining 2 into the correct local stride
* test_linearizer_failure: add failure 27 from a gpt2 kernel
found during a full fuzz test of applied_opts combos to a
depth of 4 on the gpt2 kernels w/o GROUPTOP.
added additional examples to failure 26 that don't have GROUPTOP
* add other platform failure