Commit Graph

2521 Commits

Author SHA1 Message Date
Antoine Adam c6d5e471d0
Do not import typing_extensions at runtime (#1927)
https://github.com/tinygrad/tinygrad/pull/1852 introduced typing_extensions as a runtime requirement, but the package is only noted as a requirement for linting. So trying to use `python -c 'from tinygrad.tensor import Tensor'` after `pip install -e .` on python 3.11 will fail.

It seems that this does not happens before 3.11 only because typing_extensions was a downstream dependency of pyopencl. Anyway this commit makes it clear that typing_extensions is only needed for linting, as written in setup.py.
2023-09-28 01:57:28 -07:00
nimlgen 164f8a1923
fix hipgraph exec (#1929) 2023-09-27 12:54:21 -04:00
Sean D'Souza 9c6bb7ff13
fix: add sentencepiece to testing dependencies (#1919) 2023-09-25 11:22:01 -04:00
George Hotz c907efbf4a
reorder a few things (#1915)
* reorder a few things

* huh, that has to be there

* move apply shapetracker

* BufferOps

* only for type checking
2023-09-25 10:17:21 +08:00
chenyu 25a767cd5d
Remove LtNode.__mul__ and AndNode.__mul__ (#1913) 2023-09-25 07:03:59 +08:00
chenyu eaa8d343d8
Remove str type from map_buffers (#1912) 2023-09-25 07:03:22 +08:00
Dat D. Nguyen ae9529e678
chore: remove redundant noise in stable diffusion example (#1910) 2023-09-24 21:33:45 +08:00
George Hotz 6d9065ed1c
Minor cleanups (#1911)
* cleanups

* remove that simplify
2023-09-24 21:32:50 +08:00
George Hotz 20059dc55b
Make ShapeTracker Immutable (#1909)
* ugh

* ops test pass

* fix shapetracker tests

* sym shapetracker

* shapetracker is a tuple of views now

* from_shape

* fix has variable shape

* key isn't needed

* post init assert
2023-09-24 21:09:03 +08:00
nimlgen 45f02393f0
HipGraph support (#1880)
* init hip graph

* optimize args update

* cache symbolic in jit

* remove NOSTAT

* init BasicBatchExecutor

* symbolic infer cache per jit instance

* basicbatchexec is defualt for compiled

* batch_exec is taken from ASTRunner

* no infer cache

* batched execution of hip graph

* add comment about hip graph batches

* readable hip graph
2023-09-24 20:14:36 +08:00
George Hotz 7ff7aacdb4
LazyOp out of Linearizer (#1908)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx

* dedup buffers

* hmm, bugfix

* fix tensor cores

* opts device
2023-09-24 14:30:53 +08:00
qazal 2201b46bce
Refactor Conv2d/ConvTranspose2d into a single parent class (#1906)
* refactor Conv2d/ConvTranspose2d

* raise in __call__ for the parent class

* use ABC

* drop ABC it's just syntactic sugar

* use conv2d as base for the transposed version
2023-09-24 14:23:41 +08:00
George Hotz 97dc813329
Revert "All LazyOps in the Linearizer (#1905)" (#1907)
This reverts commit a5820390db.
2023-09-24 11:51:22 +08:00
George Hotz a5820390db
All LazyOps in the Linearizer (#1905)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx
2023-09-24 11:50:00 +08:00
George Hotz 0f373b8b47
cache more uops (#1904)
* cache more uops

* fix cacheable
2023-09-23 16:50:13 +08:00
George Hotz 1e15fdaee7 disable flaky triton test 2023-09-23 14:59:36 +08:00
George Hotz 0571dd7627
move all int (#1903) 2023-09-23 14:43:45 +08:00
nimlgen 41aea3ad36
require C-contiguous array for hip._copyin (#1902) 2023-09-23 14:36:59 +08:00
Szymon Ożóg 58296c079d
Make Triton work again (#1547)
* Move ops_triton to runtime and remove errors from deprecated code

* Remove deprecated AST Kernel

* Remove deprecated buffer

* Add TritonProgram

* Triton Buffer

* Use RawCUDABuffer

* triton_compile

* Added new parameter

* pass _buf to program

* remove deprecated include

* Added triton tests

* Deprecated includes removed

* remove double print

* Disable float4 support

* Disable float4 support

* variable load fix

* Track local size

* Add pycuda to triton dependencies

* Merge test.yml

* install cuda packages for testing

* merge double package install

* remove emulated from triton tests

* upscale local index to power of 2 and add masking

* cuda envs

* Add TernaryOps

* ConstOp loading

* proper function name

* remove deprecated variables

* get global program from name

* const ops match local shape

* Enable test_nn

* remove deprecated import

* fix linter error

* Add wait logic

* Add local size override

* accumulate local shapes instead of using max shape

* Merge triton tests into global tests

* fix envs in testing

* Old testing routine

* split file into renderer and program

* remove print and starting whitespace

* pretty ptx print on debug 5

* linter errors

* ignore triton saturation tests

* ignore test example

* remove pytorch cpu extra index

* Add triton to existing testing routine

* use triton tests

* disable cuda backend in triton tests

* use cudacpu in tests

* print used device

* Print device default

* Remove print

* ensure we are running triton backend

* update variable signatures

* update dtypes for load

* infinity render fixed

* limit global size

* negative infinity now properly rendered

* split chain with parentheses for and node

* Add option to disable shared memory, disable for triton

* missing import

* Properly index and mask conditional load

* use mask only if not loading a block pointer

* nan support

* fix symbolic tests to include chain split

* proper masking for stores

* Implemented bool dtype

* Add mod

* fix loads for variables with valid range

* merge triton with cuda runtime

* merge from master

* run triton tests with cuda

* Correct target when running from triton

* conftest with triton compiler config

* use triton nightly

* verbose tests for triton

* capture stdout

* fix function depth when exiting multiple loops

* add render valid function for readabilty

* fix mask for local loops

* add _arg_int32 datatype

* fix dims for conditional loads

* enable non float stores

* correct variable dtypes

* fix type for arg_int32

* remove junk

* Added get max function for range based var.max

* remove deprecated code

* Fix triton ptxas path

* Fix testing for CI

* clamp local size by max local size instead of always running max

* Disable matmul test in triton cpu

* rerun tests

* Disable broken test in triton cpu

* whitespace removed

* rerun tests again

* Disable TestSymbolicOps for triton

* update to new uops

* linter fix

* ignore test/extra

* linting fix

* Update tinygrad/renderer/triton.py

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* remove deprecated line

* quotes type fix

* linter

* Remove unnecesary lines

* UnaryOps.NEG

* dont define constants

* Linting fix

* Disable tests that are broken in ocelot

* remove trailing whitespace

* reduce line count

* linting fix

* update to new uast

* New looping style

* Update to new uast

* make AST runner work with triton

* linting fix

* set renderer var for testing

* disable local for ocelot

* reenable all tests for ocelot

* Pass shared to cuda

* Don't group if the backend doesn't support shared mem

* use working gpuocelot branch

* enable all tests

* enable local for ocelot

* cleanup

* Update test.yml

* update cache key

* reenable test symbolic and extra

* Update test.yml

* Revert "Update test.yml" (rerun tests)

This reverts commit 98c0630ee5da4379e5c6b2437a5145fe87058c35.

* Revert "fix symbolic tests to include chain split"

This reverts commit 22a9a4c9cd14d23735e6540c8d90ee005ac4ea17.

* Revert "split chain with parentheses for and node"

This reverts commit 7499a7004ef4db785d0cd05cf292fdeff65ca90d.

* use global size from linearizer

* rename newvar to dtype to match other renderers

* join program start lines

* simplify code that adds axis to local dims

* assign r[u] in ssa

* We no longer need to replace target in src

* we no longer need to cast indices to int by hand

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-23 14:17:12 +08:00
George Hotz 6fb8b3bb60
move symbolic functions to shapetracker (#1901) 2023-09-23 11:45:08 +08:00
George Hotz 9cf13bd055
rename reduce_op (#1900)
* rename reduce_op

* more design v2
2023-09-23 11:27:36 +08:00
George Hotz 73a6ed7862
Apply ShapeTracker in interpreted backends (#1846)
* applying st

* tests pass

* minor cleanups

* torch too

* hack

* contiguous

* move mops

* contig in BN

* tests should pass

* make torch fast

* make zeros and ones contig by default

* no contig there

* fix padding with expanding

* might fix tests

* still doesn't fix bug, but should be there

* Revert "still doesn't fix bug, but should be there"

This reverts commit 8ea92f3e070c8936f7ec3d3f56247225fcaa6320.

* minor cleanups
2023-09-23 10:05:13 +08:00
Umut Zengin 3987280daf
Fix VALIDHACKS for Images and make it default (#1832)
* valid hacks

* valid hacks

* valid hacks

* new method

* new method

* handtune

* is gate load breaking?

* lint

ruff

less junk

new approach?

maybe this?

* Make it more clear

* Make it more clear

* Will deal with the linter later

* hack for linter

* subs the idx but dont touch the valid

* Updated the mod rules

* lint hack

* I believe bug fix lets see

* Mod Node left

* revert

* Maybe this wont break?

* revert

* implemented "handtuned garbage"

* revert and use VALIDHACKS

* Lets see the CI

* still broken?

* currently its jungle

* maybe this jungle ?

* This works for everything somehow

* Added test for symbolic

* lint

* final touch

* This still works

* lint

* midway clean

* less garbage

* lint

* final form

* Slow but working way

* lint and other stuff

* lint

* mypy

* Make sure CI test Openpilot valid checks

* test if CI break

* Convert back

* refactor

* refactor

* Managed to reduce openpilot time from 30 secs to 5 secs

* Refactor

* Substitute a node with variable

* flake8

* Comment and refactor

* More comprehensive mod

* refactor

* bug fix

* More shave off

* remove not sure part
2023-09-23 07:34:43 +08:00
Gijs Koning 767bb35903
Enable symbolic ops tests for LLVM (#1898)
* Enable symbolic tests for HIP and LLVM

* Only llvm
2023-09-23 07:30:26 +08:00
Gijs Koning b8ff20ffe4
Gpt2 (#1896)
* small helps

* got something working

* faster?

* faster yes

* cleanup

* cleanup

* cleanup

* Fix non jit

* Fix fp16 and some cleanup

* Fix fp16 and some cleanup

* cleanup

* similar to master

* cleanup
2023-09-22 20:14:47 +08:00
chenyu b89ee1ac83
lazy type annotation and cleanups (#1897) 2023-09-22 14:20:23 +08:00
George Hotz 78576915de
Add needed contiguous to DiskBuffer. SHM support on OSX (#1891)
* add some contiguous

* remove second contig

* Revert "remove second contig"

This reverts commit fc164f7dca1ad75b1e466e4e45a05eca58b7e0e0.

* shm on osx

* can repro bug

* don't contig zeros and ones
2023-09-22 09:16:42 +08:00
qazal d0e752003d
fixes (#1893) 2023-09-22 07:20:27 +08:00
wozeparrot 009a99a0b1
feat: way cleaner hip wrapper (#1895) 2023-09-22 07:20:03 +08:00
Yixiang Gao cb5d6576cb
cifar step time 65ms while stay above 94% (#1888)
* change reduceop heruistics

* add model ema and jit hack

* add ema eval

* have to create a duplicate eval function for jit

* remove manual seed

* 94% achieveable with normal eval

* ema is outputting the same results as normal

* fix ema bug

* ema achieves 94% with fix seed

* multigpu tested

* constant fold decay, fix jit, adjust message for multigpu

* pull SpeedyResNet out of train_cifar()
2023-09-21 11:19:32 +08:00
kormann 864746d6aa
polish print_tree (#1868)
* fix

* isinstance
2023-09-21 11:13:10 +08:00
chenyu a5090f0ee9
remove NumNode.int() (#1876) 2023-09-21 10:29:16 +08:00
Gijs Koning 9eb6310686
Fix gpt optimization (#1885)
* fix for gpt

* the actual fix

* Remove change in symbolic

* small comment
2023-09-21 10:28:18 +08:00
Szymon Ożóg bd3444797b
make ssa assign r[u] (#1887) 2023-09-21 10:20:20 +08:00
nimlgen 9450e41f70
no import when Python is shutting down (#1875) 2023-09-20 12:47:02 -04:00
Yixiang Gao 84ab47a90a
add branch up-to-date check (#1879) 2023-09-20 12:41:51 -04:00
nimlgen 504bb6d0ea
support symbolic jit in HIP (#1877) 2023-09-20 01:44:26 -04:00
chenyu cd66c9e249
no numnode in shape (#1871) 2023-09-17 07:49:45 +08:00
Yixiang Gao 18ec5a9e09
add comment bot to CI (#1873) 2023-09-16 12:22:06 -04:00
Yixiang Gao a27f6c7d62
add diff mode to sz.py (#1872) 2023-09-16 00:43:47 -04:00
nimlgen 4c31dfafb3
add seed to gpt-2 (#1869) 2023-09-15 17:34:14 -04:00
wozeparrot c870764940
Revert "add line changes diff bot to CI (#1863)" (#1870) 2023-09-15 16:56:42 -04:00
Yixiang Gao 789c84a7a3
add line changes diff bot to CI (#1863) 2023-09-15 16:29:58 -04:00
chenyu 29ac8293d7
run gpt2 in CI (#1866) 2023-09-15 04:37:02 +08:00
chenyu 1b46de1a3e
fix type of helpers.prod, add test cases (#1859) 2023-09-14 05:16:55 +08:00
chenyu e67306ba04
symbolic shape type with TypeGuard (#1852) 2023-09-13 05:27:22 +08:00
Roelof van Dijk c91b44f7bf
refactor: move size to view (#1848)
* refactor: move size to view

* fix: pylint

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-11 07:16:04 -07:00
chenyu 9e9ea20784
Fix view, CI cpu test with python 3.8 (#1845) 2023-09-10 22:37:58 -04:00
chenyu 3ec301c2d7
apply view.py patch (#1844) 2023-09-10 17:32:15 -07:00
Yixiang Gao a32951a001
add test_tensor_copy (#1840)
* add  test_tensor_copy

* fix whitespace

* add value check
2023-09-10 16:01:58 -07:00