Commit Graph

878 Commits

Author SHA1 Message Date
Ahmed Harmouche 2114dc13d1
Allow multi-input model export (#1995)
* Allow multi-input model export

* Add model export unit test

* Fix efficientnet compilation

* Only run model export test on JIT supported devices

* Skip export model test if not EXPORT_SUPPORTED_DEVICE
2023-10-07 04:13:34 -07:00
George Hotz ffa33d743a
good changes from openpilot_compile2 (#2000)
* good changed from openpilot_compile2

* float32 image type was wrong

* cleaner way to write that + a test
2023-10-06 13:33:24 -07:00
chenyu 05be57f57f
Fix llama with empty prompt (#1997)
* fix llama with one token prompt

* llama is all_jitted
2023-10-06 06:48:07 -07:00
George Hotz fa9945dac0 remove stale tests 2023-10-06 02:14:56 -07:00
George Hotz 21a2c5df73
fix up contiguous (#1978) 2023-10-05 07:22:05 -07:00
chenyu c99fa58dd2
simplify gpt2 example (#1973)
* simplify gpt2 example

* kernel_jitted_count and jit tests

* Revert "kernel_jitted_count and jit tests"

This reverts commit 31a3c26dd061dbcf6c43c295a265813ccb35b9e9.

* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00
George Hotz 2d0c1037b1
Fix up latest openpilot model (#1976)
* fix gemv triggering for gemm

* fixup_openpilot

* external test issues
2023-10-05 05:24:28 -07:00
George Hotz 3d5127038c
don't create linearizer if we are in the method cache (#1969)
* don't create linearizer if we are in the method cache

* remove unchecked properties

* that key isn't used

* fix default type is sticky
2023-10-04 12:42:58 -07:00
George Hotz de5d603ec1
corealize + remove realize from lazybuffer (#1968)
* corealize + remove realize from lazybuffer

* fix multigpu

* fix graph
2023-10-04 10:59:31 -07:00
George Hotz d449b3bef1
think about removing realize from lazybuffer (#1965)
* remove realize from lazybuffer

* okay fine, back that off

* fix tests maybe

* fix test
2023-10-04 07:18:58 -07:00
nimlgen 2ea1dd3e87
no process() in Linearizer (#1966)
* no process() in Linearizer

* more process() clean up
2023-10-04 07:18:42 -07:00
Ahmed Harmouche fb4d830a2a
Fix cast error in render_load in wgsl (#1956)
* Fix cast error in wgsl

* User render_cast intead of introducing new method

* Make it shorter

* Add back webgpu tests: efficientnet and dtypes
2023-10-04 02:29:14 -07:00
George Hotz 6a79d4044a
unrealized consts everywhere (#1963)
* unrealized consts everywhere

* don't import device from lazy

* Device isn't in Lazy

* same issue

* disable jit random
2023-10-04 01:48:10 -07:00
nimlgen f04c1a63ae
Rand works in jit (#1960)
* rand works in jit

* better jitted rand creation

* Update realize.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-03 12:55:25 -07:00
George Hotz f64d5b3ba8
move to realize.py (#1961)
* move to realize.py

* run_schedule moved
2023-10-03 07:25:40 -07:00
nimlgen e1f2c2cc19
fix jitted dist (#1955) 2023-10-02 11:45:13 -04:00
George Hotz d48a90859c
use the opts from the default device (#1954) 2023-10-02 03:13:46 -07:00
David Hou d4671cd8e3
use schedule in more places in linearizer tests (#1946)
* pass current linearizer opts to Linearizer in TestFloat4

* use schedule instead of exec_ast hook
2023-10-02 02:22:56 -07:00
David Hou 8e9db88474
expand after expr_idxs in Linearizer.global_load (#1818)
* small changes

* expand in terms of substitute, directly expand g_idxs g_valid

* delete expand_ops

* don't compare using hash

* any instead of in

thanks gijskoning

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* support tc

* testing code

* no more create_rednode

* maxsize none in view/node

* oops

* undo

* typing

* oops

* oops

* lmao

* lmao

* add expand multi test

* Node.iter_idxs

* type

* type

* delete checks!

* clean up a little?

* expand_idx in symbolic

* un-golf

* play around with types >.>

* test_substitute and also remove an incorrect test?

* get rid of range

* Update symbolic.py

* split out view cache change

* split out flat components change

* reduce diff

* reduce diff

* add some float4 tests

* fix

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
2023-09-29 10:33:34 -07:00
nimlgen 692bec7b6f
simplify CacheCollector (#1944)
* rewrite cc

* fix

* fix tests

* fix all tests

* is it better

* better with shape

* cleaner

* linter fix

* no ;

* better comment

* better comments

* no thneed changes
2023-09-29 10:13:04 -07:00
George Hotz a677a1e2cd winograd test prints op count 2023-09-29 05:41:29 -07:00
George Hotz 81cb120b0f
winograd speed test (#1942) 2023-09-29 04:40:35 -07:00
George Hotz d52df788d3
remove RawConst and add test (#1939) 2023-09-29 01:21:51 -07:00
George Hotz 22b8576887
more lazy cleanup (#1938)
* small lazy cleanups

* a few more

* cleanups

* no more realizing in the scheduler test

* a few more minor things

* that was just wrong

* fix graph. the graph test was completely useless

* make graph usable

* fix op graph
2023-09-29 00:53:29 -07:00
nimlgen 2a49f7e456
fix transfer to mapped buffers (#1923) 2023-09-29 00:50:24 -07:00
Francis Lam f445e056ed
wmma: add test and tensor core shape (#1925) 2023-09-28 18:04:28 -07:00
Yixiang Gao 094d3d71be
with Tensor.train() (#1935)
* add with.train

* remove the rest TODOs

* fix pyflake

* fix pyflake error

* fix mypy
2023-09-28 18:02:31 -07:00
wozeparrot 70671d9625
fix test_collectives (#1934)
* fix: fix test_collectives.py

* feat: reenable test_collectives
2023-09-28 11:02:22 -07:00
George Hotz adab724caa
schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
George Hotz c907efbf4a
reorder a few things (#1915)
* reorder a few things

* huh, that has to be there

* move apply shapetracker

* BufferOps

* only for type checking
2023-09-25 10:17:21 +08:00
George Hotz 6d9065ed1c
Minor cleanups (#1911)
* cleanups

* remove that simplify
2023-09-24 21:32:50 +08:00
George Hotz 20059dc55b
Make ShapeTracker Immutable (#1909)
* ugh

* ops test pass

* fix shapetracker tests

* sym shapetracker

* shapetracker is a tuple of views now

* from_shape

* fix has variable shape

* key isn't needed

* post init assert
2023-09-24 21:09:03 +08:00
George Hotz 7ff7aacdb4
LazyOp out of Linearizer (#1908)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx

* dedup buffers

* hmm, bugfix

* fix tensor cores

* opts device
2023-09-24 14:30:53 +08:00
George Hotz 97dc813329
Revert "All LazyOps in the Linearizer (#1905)" (#1907)
This reverts commit a5820390db.
2023-09-24 11:51:22 +08:00
George Hotz a5820390db
All LazyOps in the Linearizer (#1905)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx
2023-09-24 11:50:00 +08:00
Szymon Ożóg 58296c079d
Make Triton work again (#1547)
* Move ops_triton to runtime and remove errors from deprecated code

* Remove deprecated AST Kernel

* Remove deprecated buffer

* Add TritonProgram

* Triton Buffer

* Use RawCUDABuffer

* triton_compile

* Added new parameter

* pass _buf to program

* remove deprecated include

* Added triton tests

* Deprecated includes removed

* remove double print

* Disable float4 support

* Disable float4 support

* variable load fix

* Track local size

* Add pycuda to triton dependencies

* Merge test.yml

* install cuda packages for testing

* merge double package install

* remove emulated from triton tests

* upscale local index to power of 2 and add masking

* cuda envs

* Add TernaryOps

* ConstOp loading

* proper function name

* remove deprecated variables

* get global program from name

* const ops match local shape

* Enable test_nn

* remove deprecated import

* fix linter error

* Add wait logic

* Add local size override

* accumulate local shapes instead of using max shape

* Merge triton tests into global tests

* fix envs in testing

* Old testing routine

* split file into renderer and program

* remove print and starting whitespace

* pretty ptx print on debug 5

* linter errors

* ignore triton saturation tests

* ignore test example

* remove pytorch cpu extra index

* Add triton to existing testing routine

* use triton tests

* disable cuda backend in triton tests

* use cudacpu in tests

* print used device

* Print device default

* Remove print

* ensure we are running triton backend

* update variable signatures

* update dtypes for load

* infinity render fixed

* limit global size

* negative infinity now properly rendered

* split chain with parentheses for and node

* Add option to disable shared memory, disable for triton

* missing import

* Properly index and mask conditional load

* use mask only if not loading a block pointer

* nan support

* fix symbolic tests to include chain split

* proper masking for stores

* Implemented bool dtype

* Add mod

* fix loads for variables with valid range

* merge triton with cuda runtime

* merge from master

* run triton tests with cuda

* Correct target when running from triton

* conftest with triton compiler config

* use triton nightly

* verbose tests for triton

* capture stdout

* fix function depth when exiting multiple loops

* add render valid function for readabilty

* fix mask for local loops

* add _arg_int32 datatype

* fix dims for conditional loads

* enable non float stores

* correct variable dtypes

* fix type for arg_int32

* remove junk

* Added get max function for range based var.max

* remove deprecated code

* Fix triton ptxas path

* Fix testing for CI

* clamp local size by max local size instead of always running max

* Disable matmul test in triton cpu

* rerun tests

* Disable broken test in triton cpu

* whitespace removed

* rerun tests again

* Disable TestSymbolicOps for triton

* update to new uops

* linter fix

* ignore test/extra

* linting fix

* Update tinygrad/renderer/triton.py

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* remove deprecated line

* quotes type fix

* linter

* Remove unnecesary lines

* UnaryOps.NEG

* dont define constants

* Linting fix

* Disable tests that are broken in ocelot

* remove trailing whitespace

* reduce line count

* linting fix

* update to new uast

* New looping style

* Update to new uast

* make AST runner work with triton

* linting fix

* set renderer var for testing

* disable local for ocelot

* reenable all tests for ocelot

* Pass shared to cuda

* Don't group if the backend doesn't support shared mem

* use working gpuocelot branch

* enable all tests

* enable local for ocelot

* cleanup

* Update test.yml

* update cache key

* reenable test symbolic and extra

* Update test.yml

* Revert "Update test.yml" (rerun tests)

This reverts commit 98c0630ee5da4379e5c6b2437a5145fe87058c35.

* Revert "fix symbolic tests to include chain split"

This reverts commit 22a9a4c9cd14d23735e6540c8d90ee005ac4ea17.

* Revert "split chain with parentheses for and node"

This reverts commit 7499a7004ef4db785d0cd05cf292fdeff65ca90d.

* use global size from linearizer

* rename newvar to dtype to match other renderers

* join program start lines

* simplify code that adds axis to local dims

* assign r[u] in ssa

* We no longer need to replace target in src

* we no longer need to cast indices to int by hand

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-23 14:17:12 +08:00
George Hotz 73a6ed7862
Apply ShapeTracker in interpreted backends (#1846)
* applying st

* tests pass

* minor cleanups

* torch too

* hack

* contiguous

* move mops

* contig in BN

* tests should pass

* make torch fast

* make zeros and ones contig by default

* no contig there

* fix padding with expanding

* might fix tests

* still doesn't fix bug, but should be there

* Revert "still doesn't fix bug, but should be there"

This reverts commit 8ea92f3e070c8936f7ec3d3f56247225fcaa6320.

* minor cleanups
2023-09-23 10:05:13 +08:00
Umut Zengin 3987280daf
Fix VALIDHACKS for Images and make it default (#1832)
* valid hacks

* valid hacks

* valid hacks

* new method

* new method

* handtune

* is gate load breaking?

* lint

ruff

less junk

new approach?

maybe this?

* Make it more clear

* Make it more clear

* Will deal with the linter later

* hack for linter

* subs the idx but dont touch the valid

* Updated the mod rules

* lint hack

* I believe bug fix lets see

* Mod Node left

* revert

* Maybe this wont break?

* revert

* implemented "handtuned garbage"

* revert and use VALIDHACKS

* Lets see the CI

* still broken?

* currently its jungle

* maybe this jungle ?

* This works for everything somehow

* Added test for symbolic

* lint

* final touch

* This still works

* lint

* midway clean

* less garbage

* lint

* final form

* Slow but working way

* lint and other stuff

* lint

* mypy

* Make sure CI test Openpilot valid checks

* test if CI break

* Convert back

* refactor

* refactor

* Managed to reduce openpilot time from 30 secs to 5 secs

* Refactor

* Substitute a node with variable

* flake8

* Comment and refactor

* More comprehensive mod

* refactor

* bug fix

* More shave off

* remove not sure part
2023-09-23 07:34:43 +08:00
Gijs Koning 767bb35903
Enable symbolic ops tests for LLVM (#1898)
* Enable symbolic tests for HIP and LLVM

* Only llvm
2023-09-23 07:30:26 +08:00
George Hotz 78576915de
Add needed contiguous to DiskBuffer. SHM support on OSX (#1891)
* add some contiguous

* remove second contig

* Revert "remove second contig"

This reverts commit fc164f7dca1ad75b1e466e4e45a05eca58b7e0e0.

* shm on osx

* can repro bug

* don't contig zeros and ones
2023-09-22 09:16:42 +08:00
qazal d0e752003d
fixes (#1893) 2023-09-22 07:20:27 +08:00
chenyu a5090f0ee9
remove NumNode.int() (#1876) 2023-09-21 10:29:16 +08:00
nimlgen 9450e41f70
no import when Python is shutting down (#1875) 2023-09-20 12:47:02 -04:00
chenyu 1b46de1a3e
fix type of helpers.prod, add test cases (#1859) 2023-09-14 05:16:55 +08:00
chenyu e67306ba04
symbolic shape type with TypeGuard (#1852) 2023-09-13 05:27:22 +08:00
chenyu 3ec301c2d7
apply view.py patch (#1844) 2023-09-10 17:32:15 -07:00
Yixiang Gao a32951a001
add test_tensor_copy (#1840)
* add  test_tensor_copy

* fix whitespace

* add value check
2023-09-10 16:01:58 -07:00
George Hotz 47e602f717
view: do not trade complexity for speed (#1839)
* view: do not trade complexity for speed

* staticmethods

* view create
2023-09-10 11:29:53 -07:00
David Hou e74a6ca7e4
expand in terms of substitute (#1827) 2023-09-09 14:43:00 -07:00
nimlgen 31fca43706
kopt works with local+grouped reduce and tests (#1824) 2023-09-09 13:22:09 -07:00