Commit Graph

14 Commits

Author SHA1 Message Date
Szymon Ożóg 58296c079d
Make Triton work again (#1547)
* Move ops_triton to runtime and remove errors from deprecated code

* Remove deprecated AST Kernel

* Remove deprecated buffer

* Add TritonProgram

* Triton Buffer

* Use RawCUDABuffer

* triton_compile

* Added new parameter

* pass _buf to program

* remove deprecated include

* Added triton tests

* Deprecated includes removed

* remove double print

* Disable float4 support

* Disable float4 support

* variable load fix

* Track local size

* Add pycuda to triton dependencies

* Merge test.yml

* install cuda packages for testing

* merge double package install

* remove emulated from triton tests

* upscale local index to power of 2 and add masking

* cuda envs

* Add TernaryOps

* ConstOp loading

* proper function name

* remove deprecated variables

* get global program from name

* const ops match local shape

* Enable test_nn

* remove deprecated import

* fix linter error

* Add wait logic

* Add local size override

* accumulate local shapes instead of using max shape

* Merge triton tests into global tests

* fix envs in testing

* Old testing routine

* split file into renderer and program

* remove print and starting whitespace

* pretty ptx print on debug 5

* linter errors

* ignore triton saturation tests

* ignore test example

* remove pytorch cpu extra index

* Add triton to existing testing routine

* use triton tests

* disable cuda backend in triton tests

* use cudacpu in tests

* print used device

* Print device default

* Remove print

* ensure we are running triton backend

* update variable signatures

* update dtypes for load

* infinity render fixed

* limit global size

* negative infinity now properly rendered

* split chain with parentheses for and node

* Add option to disable shared memory, disable for triton

* missing import

* Properly index and mask conditional load

* use mask only if not loading a block pointer

* nan support

* fix symbolic tests to include chain split

* proper masking for stores

* Implemented bool dtype

* Add mod

* fix loads for variables with valid range

* merge triton with cuda runtime

* merge from master

* run triton tests with cuda

* Correct target when running from triton

* conftest with triton compiler config

* use triton nightly

* verbose tests for triton

* capture stdout

* fix function depth when exiting multiple loops

* add render valid function for readabilty

* fix mask for local loops

* add _arg_int32 datatype

* fix dims for conditional loads

* enable non float stores

* correct variable dtypes

* fix type for arg_int32

* remove junk

* Added get max function for range based var.max

* remove deprecated code

* Fix triton ptxas path

* Fix testing for CI

* clamp local size by max local size instead of always running max

* Disable matmul test in triton cpu

* rerun tests

* Disable broken test in triton cpu

* whitespace removed

* rerun tests again

* Disable TestSymbolicOps for triton

* update to new uops

* linter fix

* ignore test/extra

* linting fix

* Update tinygrad/renderer/triton.py

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* remove deprecated line

* quotes type fix

* linter

* Remove unnecesary lines

* UnaryOps.NEG

* dont define constants

* Linting fix

* Disable tests that are broken in ocelot

* remove trailing whitespace

* reduce line count

* linting fix

* update to new uast

* New looping style

* Update to new uast

* make AST runner work with triton

* linting fix

* set renderer var for testing

* disable local for ocelot

* reenable all tests for ocelot

* Pass shared to cuda

* Don't group if the backend doesn't support shared mem

* use working gpuocelot branch

* enable all tests

* enable local for ocelot

* cleanup

* Update test.yml

* update cache key

* reenable test symbolic and extra

* Update test.yml

* Revert "Update test.yml" (rerun tests)

This reverts commit 98c0630ee5da4379e5c6b2437a5145fe87058c35.

* Revert "fix symbolic tests to include chain split"

This reverts commit 22a9a4c9cd14d23735e6540c8d90ee005ac4ea17.

* Revert "split chain with parentheses for and node"

This reverts commit 7499a7004ef4db785d0cd05cf292fdeff65ca90d.

* use global size from linearizer

* rename newvar to dtype to match other renderers

* join program start lines

* simplify code that adds axis to local dims

* assign r[u] in ssa

* We no longer need to replace target in src

* we no longer need to cast indices to int by hand

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-23 14:17:12 +08:00
George Hotz 63c46e0287
Parens and gls (#1768)
* more paren stripping

* remove global and local size from renderers

* complex strip parens

* extra helpers + minor webgpu fix

* fix test uops

* one more parens test
2023-09-04 16:09:01 -07:00
George Hotz c6d5d45a2b
Remove MemOp (#1750)
* start removing memop

* locals

* support both stores

* might be correct

* remove parens on shape ish

* fix metal ops

* render load and render store

* fix image

* maybe fix asm

* fix test uops

* revert asm

* remove memop itself
2023-09-04 09:58:33 -07:00
George Hotz e17b1af160
UnaryOps.NEG (#1749) 2023-09-03 12:44:26 -07:00
George Hotz 91258aa67f
render const (#1736)
* render const

* remove constop

* fix llvm and webgpu

* disable consts in llvm again

* assembly special

* fix const rendering

* fix arm64

* imms are int

* fix ptx

* fix arm64
2023-09-01 19:01:43 -07:00
George Hotz cd844ec4b2
remove Token class (#1723)
* no fusion

* no float4 grouping

* mulacc fusion is fine. remove uop_alu

* fully remove get_grouped_maybe_float4

* removed that test

* that's not float4 anymore

* disable failing arm64

* metal ops pass tokenless

* fix wmma

* update test_uops with new style

* fix gep

* fix float4 store

* fix float4 store more

* cuda tests pass

* disable broadcast pow

* fix ptx

* reenable arm64

* bring cse back

* don't cache the acc

* fix ptx bug
2023-09-01 12:53:07 -07:00
nimlgen c93e63b8b5
make TestNonFloatUOps.test_mul_bool pass on all platforms (#1557) 2023-08-16 11:34:09 -07:00
madt2709 962972ee68
Fix uops int32 for llvm (#1554)
* fix-uops-int32-llvm

* fix tests

* Ignore mypy error
2023-08-15 23:22:32 -07:00
George Hotz 0b5930d406
more uops testing, who isn't passing right now... (#1522)
* more uops

* llvm refactor

* update test uops

* rest of the nodes

* ors and ands
2023-08-15 09:07:26 -07:00
Steven Anderson 93a36c3659
Arm (#1421)
* testing new memops

* better debugging

* testing padded conv

* branching with load

* refactoring a bit

* first try

* fixing bugs

* fixing some

* eq

* eq2

* do not use x's

* working

* fixing imm

* getting things working

* refactor

* pow not working

* working except one

* refactor: one store mem

* refactor: global load

* refactor: imm

* refactor: cleaning

* fixing big offsets

* refactor with ci

* try ci

* typo

* another typo

* ubuntu default

* forgot git

* do i need git?

* missing packages

* adding python-dev

* with cache?

* buildx action

* buildx name issue?

* maybe now?

* python3

* newline warning

* maybe now

* i actually need this

* ci should work now

* improved caching

* fixing cache

* maybe now it will cache

* this

* testing cache

* trying again

* load

* missing platform

* caching gha

* testing cache

* full testing

* typo

* now?

* why

* adding checkout back

* bad formatting

* fixing convention issues

* supporting python

* adding CI flag

* testing all

* better comments

* adding debugging

* takes 12x longer

* does it output progress now?

* ignore models for speed

* fixing merge

* excluding conv_transpose2d

* only 2 test cuz is to slow

* another approach

* let's see

* faster duh

* my bad

* T_T

* typo

* sup

* with output?

* comment test

* comment test

* comment test

* :?

* no comment

* with cache

* back to normal

* testing that ci works

* back to passing

* trying again

* does it create another entry

* does it create another entry?

* build local

* hey

* Revert "excluding conv_transpose2d"

This reverts commit cc7348de03033e032f47d69caff174e2f1a7bfea.

* does it cache if done before?

* does it cache?

* done

* adding test ops

* bad formatting

* no need for this

* working static mem

* sum 1d

* add ndim

* better reg import

* fix stack

* back to np

* working except for softmax

* 5 failing

* no pogress

* remove keystone

* remove keystone

* testops passing

* cleanups

* more cleanup

* typo

* ci

* ci2

* cond import

* ci3

* ci4

* ci4

* ci5

* ci5

* ci6

* aligment

* test all

* correct test

* err read_unmapped

* passing test

* ignore for speed

* ignore for speed

* ci7

* cleanup

* remove docker

* fixing merge

* fixing bugs

* add skipload for const ops

* comments

* First merge to master: Renderer

* fix emulation

* passing all tests arm64

* cleaning

* fix handcoded binary

* cleaning

* fix errs

* fix runtime arg binary

* clean git diff

* fix and clean

* fixing metal test

* cleaning

* fix metal test

* ci ~8 min

* fix pylint and clang

* cache the files in ops_clang

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-14 19:29:30 -07:00
George Hotz 38fe84d92b
cleanup mlops (#1521)
* cleanup mlops

* that line belongs there
2023-08-10 19:53:28 -07:00
George Hotz d24f936501
just cmplt (#1493)
* just cmplt

* fix maximum

* don't save, there's no backward

* ugh, no slot either

* eq is a scam
2023-08-08 13:58:10 -07:00
George Hotz d67e248d9b
simple bitcast 2 (#1445)
* simple bitcast 2

* bc 2

* empty

* Revert "empty"

This reverts commit d8ee083655b67947afb1e577020b4395d001832c.
2023-08-06 00:30:50 -07:00
George Hotz 7b8d06c9f1
test uops (#1444)
* test uops

* tests should pass

* improve uops

* precision
2023-08-05 12:35:56 -07:00