Commit Graph

2341 Commits

Author SHA1 Message Date
CaltropHungerton 38fb1e14a2
Intel XMX Tensor Core Support (#5622)
* fixed xmx demo

* i think i'm invoking the DPAS but it's slow

* compiler build arg to stop register spilling, indicated where to fix flop counter

* don't mind this

* do NOT mind me

* do not mind me

* do not view

* i will add bf16 later

* in process of figuring out tc fields

* we figured out the fields!!!

* added check for cl device vendor, added seperate IntelRenderer

* remove tc thread_local_aliases

* cleaning debris before draft pr

* edits for linter

* deduping and checking device extensions

* i will find more line reductions in other places

* before merge upstream

* double grf size in compiler to fix register spilling (bandaid), device checking changes

* tc python emulation

* fixed emulation

* tests for emulated intel tensor core

* TC=0, 1 working on upstream, fixed perf

* test

* debris

* check for specialized cl device when we canonicalize device

* bf16 support, tc=3 test added

* address tests

* revert half2 loads on intel tc, cleanup

* linter

* fold_expanded revert

* lint, whitespace fix

* cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too

* make line shorter, no need for noqa E501

* removed device intel

* fix python emulation

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-16 09:19:21 -07:00
George Hotz 553ae9ebc0
bilinear interp uint8 fails (#6103)
* new test for e2e compile failures

* fix bug

* bilinear interp uint8 fails

* better tests
2024-08-15 19:34:39 -07:00
George Hotz c850e03758
new test for e2e compile failures (#6101)
* new test for e2e compile failures

* fix bug
2024-08-15 18:56:22 -07:00
chenyu 9ef82e1f2b
UOp pattern DEFINE_VAR with min==max is also CONST (#6095)
* UOp pattern DEFINE_VAR with min==max is also CONST

* fix tests
2024-08-15 12:09:44 -04:00
qazal 4d38fec8c1
rename lazyops to parents [run_process_replay] (#6091) 2024-08-15 17:27:32 +03:00
chenyu 5accfe26a0
rewrite bool ADD to OR and MUL to AND (#6084)
* rewrite bool ADD to OR and MUL to AND

fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.

only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure

* fold those, and fix tests

* only for bool

* move dtypes.bool
2024-08-15 10:11:57 -04:00
chenyu df03dca6e3
move % inside UOp mod_folding and remove deprecated tests (#6085)
[run_process_replay]
2024-08-14 23:25:10 -04:00
qazal 2bf7b56485
minor test fixups from the AST is UOp diff (#6081)
* add assert_equiv_uops cache

* dont expect lowering and schedule errors
2024-08-14 23:58:04 +03:00
George Hotz 64563abc90
add LSTMCell to nn (#6080)
* add LSTMCell to nn

* lstmcell works with no input on first

* fix no bias 0

* simpler
2024-08-14 12:08:42 -07:00
chenyu 6b3112d525
fix qcom process_replay for kernel diff (#6079)
* debug why qcom process_replay does not run

skipping the wrong exception?

* um-hum

* get_step_times was parsed incorrectly

* cleanup
2024-08-14 15:05:49 -04:00
chenyu 2fe9d62451
increase test_recursive_add time from 1s to 2s (#6078)
flaky https://github.com/chenyuxyz/tinygrad/actions/runs/10392144818/job/28776666700
2024-08-14 13:52:02 -04:00
samm393 2dc586ffe5
Shape change bitcast for more dtypes (#6047)
* bitcast & tests

* use to_dtype

* put disk tensor tests back

* tests

* bitmask

* no bitmask

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-14 10:03:34 -07:00
qazal 83a2543c74
spec for in order LOAD/STORE indexing (#6073)
* test_unaligns_idxs

* spec for in order LOAD/STORE indexing

* test UOps.SPECIAL

* check for supports_float4
2024-08-14 19:18:00 +03:00
chenyu 5048f9a4d5
test linearizer failure 49 (#6074)
with UOP_IS_SYMBOLIC=1, on METAL it breaks store fusion and have A+B and B+A being two different UOp
2024-08-14 11:29:10 -04:00
qazal 30035df5a4
add metal process replay back (#6068)
test this new one
2024-08-14 12:29:56 +03:00
chenyu 1782e4f64d
use div folding to do lt folding (#6065) 2024-08-13 16:59:05 -04:00
chenyu e3af273fa1
touchup cl_errors (#6058)
* touchup cl_errors

* update test
2024-08-13 13:06:59 -04:00
qazal 9145ad52ff
revert UOps eq, this needs to be isolated in realize.py (#6063)
This reverts commit dccca7f227.
2024-08-13 18:02:34 +03:00
Tobias Fischer 6e3eb50fd1
added fix and reg tests (#6060) 2024-08-12 21:00:48 -04:00
qazal dccca7f227
test: uop and lazyop have the same compare (#6053)
* test: uop and lazyop have the same compare

* typings

* self.assert_equiv_uops -> assertEqual

* hash dtype

* test nop too

* TestPatternMatcher never used this compare anyway

* nop eq and ne tests
2024-08-13 00:33:19 +03:00
chenyu 3f2d24a6ec
test_failure_48 for wrong truncation in idx on NV (#6055)
also added `RAWAST` to print pre-modified AST in DEBUG=3
2024-08-12 16:17:42 -04:00
chenyu 6ed9711898
UOps pattern (x%c)+(x//c)*c = x (#6051)
pretty cool that this is very easy to write now
2024-08-12 14:58:48 -04:00
ignaciosica 777d6b3349
Fix compile error for max with inline const (#5840) 2024-08-12 23:40:39 +08:00
ignaciosica 164ca5632e
split tensor core tests (#6041) 2024-08-12 09:42:02 -04:00
chenyu 7ce716b3a0
bigint -> pyint [run_process_replay] (#6040)
it's a python int. priority should be  higher than bool, but we are not using it in type promo now.
2024-08-12 09:12:23 -04:00
Timmy a00994b423
Lowerer Multireduce Uopgraph (#6007)
* uopgraph changes

* fixing for non-reducing ranges

* multireduce tests

* linters

* linters

* removing comments

* removing arg[1]

* linters

* prettier

* linters

* more linters

* use any instead of intersection
2024-08-12 15:16:07 +03:00
qazal 7d1f118731
use assertIs in test_schedule (#6035)
* use self.assertIs in test_schedule

* test_lazybuffer
2024-08-11 19:19:18 +03:00
qazal b918e3c255
cache assert_equiv_uops (#6033) 2024-08-11 12:17:05 +03:00
George Hotz 1b3443902c
don't use tgmath with clang (#6029)
* don't use tgmath with clang

* fix tests

* nostdlib for clang

* needs ffreestanding on OSX
2024-08-10 13:58:19 -07:00
chenyu 5820940d98
more relax rtol for test_arange_fuse_grouped_children (#6027)
one more https://github.com/chenyuxyz/tinygrad/actions/runs/10334072657/job/28607120462
2024-08-10 16:10:03 -04:00
chenyu 10374a2741
relax rtol for test_arange_fuse_grouped_children (#6026)
flaky https://github.com/tinygrad/tinygrad/actions/runs/10333939631/job/28606831006?pr=6023
2024-08-10 15:49:11 -04:00
George Hotz cf7d3c1eb8
fix tests locally on metal (#6025)
* remove contiguous child, it was breaking tests locally

* hmm, it's still needed

* include NOOPT in method cache key
2024-08-10 12:36:22 -07:00
chenyu e6c7c3e499
update pylint path to check indent/space for all (#6022)
also fixed many errors. it was not checking nested dirs. exclude autogen for now.

can we use ruff for this?
2024-08-10 14:41:09 -04:00
George Hotz cfb04c67d1
run unit tests separate from others (and only once) (#6020)
* run unit tests separate from others

* ignore unit tests elsewhere
2024-08-10 11:17:56 -07:00
uuuvn ee3b015407
ELF loader strtab fix and tests (#6011)
* ELF loader strtab fix and tests

* ruff

* typos

* only one test
2024-08-10 10:13:16 -07:00
Jun Zhang 54e176fb4f
Ignore non-computational backends when overwriting the default (#5770) 2024-08-10 09:23:29 -07:00
qazal 3ef2788c4f
hotfix: run the entire test_conv_bw schedule (#6014) 2024-08-10 17:55:41 +03:00
qazal 0e62076cf5
more process replay cleanups (#6013)
* more process replay cleanups

* comma benchmark missing
2024-08-10 17:29:10 +03:00
chenyu 63a8bc29d4
addition divisor in UOp div_folding (#6002)
in addition to try gcd of all terms, also try least common divisor of all MULs
2024-08-09 20:09:05 -04:00
chenyu 5961faa4be
minor change to UOp div_fold (#6004)
remove an unnecessary gcd and swap the quo rem order, minimize diff for divisor pr
2024-08-09 17:09:59 -04:00
qazal 7373b05ee8
assert conv bw reduceops merge [compare_schedule] (#6001)
* assert conv bw reduceops merge [compare_schedule]

* diff with ref_commit_hash
2024-08-09 19:29:56 +03:00
qazal b67d521a07
assert test_conv_bw correctness (#6000)
* assert test_conv_bw correctness

* reorder half

* metal and clang still red
2024-08-09 18:30:36 +03:00
qazal a833f1a735
scheduler process replay with [compare_schedule] (#5997) 2024-08-09 16:58:22 +03:00
qazal 24c7c41ce0
diff LazyBuffer schedules in process replay (#5996)
* start diff printing

* this should be 2

* add to process_replay.py

* enable schedule capture

* arange diff is process replay
2024-08-09 14:16:43 +03:00
chenyu 1f1eb46af6
more failed simplified UOp div test case (#5992)
this speculative div was handled by "divisor" in symbolic.
2024-08-08 18:39:25 -04:00
chenyu c3e1ae2535
add failed simplified UOp div test case (#5990)
more cases!
2024-08-08 17:37:48 -04:00
nimlgen 38d5eecc68
hcq profiler support args (#5989)
* hcq profiler support args

* bytes -> _bytes

* fix

* add test

* mypy

* not f strings

* percison
2024-08-09 00:18:36 +03:00
qazal 45b1761175
smaller test_llama_embedding + assert correctness (#5986)
* smaller test_llama_embedding in CI

* test correctness
2024-08-08 22:11:29 +03:00
Timmy 8c99bdab08
More Multireduce Tests (#5968)
* multireduce tests

* linters

* more linters

* more linters

* seeing how it works with parallel
2024-08-08 22:04:08 +03:00
gswangg df44a4e861
Make vectorization of CONST explicit (#5322)
* remove test_const_vectorize_fold

* remove const folding UPat for VECTORIZE

* refactor cstyle render_const

* remove calls to dtype.scalar() in render_const

* add assert

* add vectorized const to UOp.const

* add UPat GEP-VECTORIZE-CONST -> CONST

* render_vectorize for DEFINE_ACC in cstyle

* add back missing render_cast in render_const

* generate vectorized consts as UOps for DEFINE_ACC

* update asserts for DEFINE_ACC with VECTORIZE src

* add UPats for PHI with VECTORIZE src

* use prev rendered vectorize in DEFINE_ACC render

* update DEFINE_ACC in python runtime

* update vectorized DEFINE_ACC in PTXRenderer

* rebase DEFINE_ACC changes on lowerer

* verbose rewrite of bad UPats

* simplify UOps.CONST implementation in ops_python

* update sum_collapse UPats for DEFINE_ACC-VECTORIZE

* revert linearizer to TOT

* fix DEFINE_ACC implementation in ops_python

* simplify DEFINE_ACC in cstyle

* Fix linter error

* support VECTORIZE in fold gated load/store UPat

* support VECTORIZE in other fold gated load UPats

* rewrite VECTORIZE in UPat for no input DEFINE_ACC

* simplify DEFINE_ACC render in cstyle

* make VECTORIZE rules more concise

* add more vectorize fold tests

* inline VECTORIZE-CONSTs in cstyle render

* revert VECTORIZE/GEP rule refactor

* revert cstyle render_const refactor

* inline VECTORIZE-CONSTs in cstyle render

* implicitly vectorized const rendering -> explicit

* WMMA VECTORIZE CONST process replay hacks

* VECTORIZE CONST NAN process_replay hacks

* more VECTORIZE CONST NAN hacks

* cleanup process_replay hacks

* isnan() -> not isfinite() cstyle VECTORIZE CONST

* tweak isnan and isfinite checks VECTORIZE CONST

* tweak for positive vs negative infinity VECTORIZE CONST

* add assert to PTX CONST render

* process_replay VECTORIZE CONST render parity for PTX STORE

* vmin/vmax for VECTORIZE'd CONST

* update WMMA folding rules

* add tests for WMMA VECTORIZE fold

* hack for cstyle half4 CONST zero process_replay parity

* revert PTX backend changes

* add back minimal DEFINE_ACC PTX change

* remove cstyle process_replay hacks

* remove dead code in PTX CONST render

* cleanup vmin/vmax logic for VECTORIZE'd CONSTs

* update vectorize fold tests to use DEFINE_VAR

* fix long line formatting in test

* remove unwanted merge artifact

* more vmin/vmax cleanup

* remove unnecessary asserts

* yet more vmin/vmax cleanup

* get rid of explicit VECTORIZE CONST logic in _min_max

* reuse CONST instead of creating a new one

* remove unneeded cast

* handle DType correctly in sconst

* improve readability of tests

* save a line

* save another line

* tuplize pats in src

* remove GEP-VECTORIZE pats

* add vec +0 fold

* HACK: fold only vec8 +0

* remove vectorized ALU fold hack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-08 20:59:05 +03:00