tinygrad

Commit Graph

Author	SHA1	Message	Date
CaltropHungerton	38fb1e14a2	Intel XMX Tensor Core Support (#5622 ) * fixed xmx demo * i think i'm invoking the DPAS but it's slow * compiler build arg to stop register spilling, indicated where to fix flop counter * don't mind this * do NOT mind me * do not mind me * do not view * i will add bf16 later * in process of figuring out tc fields * we figured out the fields!!! * added check for cl device vendor, added seperate IntelRenderer * remove tc thread_local_aliases * cleaning debris before draft pr * edits for linter * deduping and checking device extensions * i will find more line reductions in other places * before merge upstream * double grf size in compiler to fix register spilling (bandaid), device checking changes * tc python emulation * fixed emulation * tests for emulated intel tensor core * TC=0, 1 working on upstream, fixed perf * test * debris * check for specialized cl device when we canonicalize device * bf16 support, tc=3 test added * address tests * revert half2 loads on intel tc, cleanup * linter * fold_expanded revert * lint, whitespace fix * cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too * make line shorter, no need for noqa E501 * removed device intel * fix python emulation --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-08-16 09:19:21 -07:00
George Hotz	553ae9ebc0	bilinear interp uint8 fails (#6103 ) * new test for e2e compile failures * fix bug * bilinear interp uint8 fails * better tests	2024-08-15 19:34:39 -07:00
George Hotz	c850e03758	new test for e2e compile failures (#6101 ) * new test for e2e compile failures * fix bug	2024-08-15 18:56:22 -07:00
chenyu	9ef82e1f2b	UOp pattern DEFINE_VAR with min==max is also CONST (#6095 ) * UOp pattern DEFINE_VAR with min==max is also CONST * fix tests	2024-08-15 12:09:44 -04:00
qazal	4d38fec8c1	rename lazyops to parents [run_process_replay] (#6091 )	2024-08-15 17:27:32 +03:00
chenyu	5accfe26a0	rewrite bool ADD to OR and MUL to AND (#6084 ) * rewrite bool ADD to OR and MUL to AND fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor. only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure * fold those, and fix tests * only for bool * move dtypes.bool	2024-08-15 10:11:57 -04:00
chenyu	df03dca6e3	move % inside UOp mod_folding and remove deprecated tests (#6085 ) [run_process_replay]	2024-08-14 23:25:10 -04:00
qazal	2bf7b56485	minor test fixups from the AST is UOp diff (#6081 ) * add assert_equiv_uops cache * dont expect lowering and schedule errors	2024-08-14 23:58:04 +03:00
George Hotz	64563abc90	add LSTMCell to nn (#6080 ) * add LSTMCell to nn * lstmcell works with no input on first * fix no bias 0 * simpler	2024-08-14 12:08:42 -07:00
chenyu	6b3112d525	fix qcom process_replay for kernel diff (#6079 ) * debug why qcom process_replay does not run skipping the wrong exception? * um-hum * get_step_times was parsed incorrectly * cleanup	2024-08-14 15:05:49 -04:00
chenyu	2fe9d62451	increase test_recursive_add time from 1s to 2s (#6078 ) flaky https://github.com/chenyuxyz/tinygrad/actions/runs/10392144818/job/28776666700	2024-08-14 13:52:02 -04:00
samm393	2dc586ffe5	Shape change bitcast for more dtypes (#6047 ) * bitcast & tests * use to_dtype * put disk tensor tests back * tests * bitmask * no bitmask --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-08-14 10:03:34 -07:00
qazal	83a2543c74	spec for in order LOAD/STORE indexing (#6073 ) * test_unaligns_idxs * spec for in order LOAD/STORE indexing * test UOps.SPECIAL * check for supports_float4	2024-08-14 19:18:00 +03:00
chenyu	5048f9a4d5	test linearizer failure 49 (#6074 ) with UOP_IS_SYMBOLIC=1, on METAL it breaks store fusion and have A+B and B+A being two different UOp	2024-08-14 11:29:10 -04:00
qazal	30035df5a4	add metal process replay back (#6068 ) test this new one	2024-08-14 12:29:56 +03:00
chenyu	1782e4f64d	use div folding to do lt folding (#6065 )	2024-08-13 16:59:05 -04:00
chenyu	e3af273fa1	touchup cl_errors (#6058 ) * touchup cl_errors * update test	2024-08-13 13:06:59 -04:00
qazal	9145ad52ff	revert UOps eq, this needs to be isolated in realize.py (#6063 ) This reverts commit `dccca7f227`.	2024-08-13 18:02:34 +03:00
Tobias Fischer	6e3eb50fd1	added fix and reg tests (#6060 )	2024-08-12 21:00:48 -04:00
qazal	dccca7f227	test: uop and lazyop have the same compare (#6053 ) * test: uop and lazyop have the same compare * typings * self.assert_equiv_uops -> assertEqual * hash dtype * test nop too * TestPatternMatcher never used this compare anyway * nop eq and ne tests	2024-08-13 00:33:19 +03:00
chenyu	3f2d24a6ec	test_failure_48 for wrong truncation in idx on NV (#6055 ) also added `RAWAST` to print pre-modified AST in DEBUG=3	2024-08-12 16:17:42 -04:00
chenyu	6ed9711898	UOps pattern (x%c)+(x//c)*c = x (#6051 ) pretty cool that this is very easy to write now	2024-08-12 14:58:48 -04:00
ignaciosica	777d6b3349	Fix compile error for max with inline const (#5840 )	2024-08-12 23:40:39 +08:00
ignaciosica	164ca5632e	split tensor core tests (#6041 )	2024-08-12 09:42:02 -04:00
chenyu	7ce716b3a0	bigint -> pyint [run_process_replay] (#6040 ) it's a python int. priority should be higher than bool, but we are not using it in type promo now.	2024-08-12 09:12:23 -04:00
Timmy	a00994b423	Lowerer Multireduce Uopgraph (#6007 ) * uopgraph changes * fixing for non-reducing ranges * multireduce tests * linters * linters * removing comments * removing arg[1] * linters * prettier * linters * more linters * use any instead of intersection	2024-08-12 15:16:07 +03:00
qazal	7d1f118731	use assertIs in test_schedule (#6035 ) * use self.assertIs in test_schedule * test_lazybuffer	2024-08-11 19:19:18 +03:00
qazal	b918e3c255	cache assert_equiv_uops (#6033 )	2024-08-11 12:17:05 +03:00
George Hotz	1b3443902c	don't use tgmath with clang (#6029 ) * don't use tgmath with clang * fix tests * nostdlib for clang * needs ffreestanding on OSX	2024-08-10 13:58:19 -07:00
chenyu	5820940d98	more relax rtol for test_arange_fuse_grouped_children (#6027 ) one more https://github.com/chenyuxyz/tinygrad/actions/runs/10334072657/job/28607120462	2024-08-10 16:10:03 -04:00
chenyu	10374a2741	relax rtol for test_arange_fuse_grouped_children (#6026 ) flaky https://github.com/tinygrad/tinygrad/actions/runs/10333939631/job/28606831006?pr=6023	2024-08-10 15:49:11 -04:00
George Hotz	cf7d3c1eb8	fix tests locally on metal (#6025 ) * remove contiguous child, it was breaking tests locally * hmm, it's still needed * include NOOPT in method cache key	2024-08-10 12:36:22 -07:00
chenyu	e6c7c3e499	update pylint path to check indent/space for all (#6022 ) also fixed many errors. it was not checking nested dirs. exclude autogen for now. can we use ruff for this?	2024-08-10 14:41:09 -04:00
George Hotz	cfb04c67d1	run unit tests separate from others (and only once) (#6020 ) * run unit tests separate from others * ignore unit tests elsewhere	2024-08-10 11:17:56 -07:00
uuuvn	ee3b015407	ELF loader strtab fix and tests (#6011 ) * ELF loader strtab fix and tests * ruff * typos * only one test	2024-08-10 10:13:16 -07:00
Jun Zhang	54e176fb4f	Ignore non-computational backends when overwriting the default (#5770 )	2024-08-10 09:23:29 -07:00
qazal	3ef2788c4f	hotfix: run the entire test_conv_bw schedule (#6014 )	2024-08-10 17:55:41 +03:00
qazal	0e62076cf5	more process replay cleanups (#6013 ) * more process replay cleanups * comma benchmark missing	2024-08-10 17:29:10 +03:00
chenyu	63a8bc29d4	addition divisor in UOp div_folding (#6002 ) in addition to try gcd of all terms, also try least common divisor of all MULs	2024-08-09 20:09:05 -04:00
chenyu	5961faa4be	minor change to UOp div_fold (#6004 ) remove an unnecessary gcd and swap the quo rem order, minimize diff for divisor pr	2024-08-09 17:09:59 -04:00
qazal	7373b05ee8	assert conv bw reduceops merge [compare_schedule] (#6001 ) * assert conv bw reduceops merge [compare_schedule] * diff with ref_commit_hash	2024-08-09 19:29:56 +03:00
qazal	b67d521a07	assert test_conv_bw correctness (#6000 ) * assert test_conv_bw correctness * reorder half * metal and clang still red	2024-08-09 18:30:36 +03:00
qazal	a833f1a735	scheduler process replay with [compare_schedule] (#5997 )	2024-08-09 16:58:22 +03:00
qazal	24c7c41ce0	diff LazyBuffer schedules in process replay (#5996 ) * start diff printing * this should be 2 * add to process_replay.py * enable schedule capture * arange diff is process replay	2024-08-09 14:16:43 +03:00
chenyu	1f1eb46af6	more failed simplified UOp div test case (#5992 ) this speculative div was handled by "divisor" in symbolic.	2024-08-08 18:39:25 -04:00
chenyu	c3e1ae2535	add failed simplified UOp div test case (#5990 ) more cases!	2024-08-08 17:37:48 -04:00
nimlgen	38d5eecc68	hcq profiler support args (#5989 ) * hcq profiler support args * bytes -> _bytes * fix * add test * mypy * not f strings * percison	2024-08-09 00:18:36 +03:00
qazal	45b1761175	smaller test_llama_embedding + assert correctness (#5986 ) * smaller test_llama_embedding in CI * test correctness	2024-08-08 22:11:29 +03:00
Timmy	8c99bdab08	More Multireduce Tests (#5968 ) * multireduce tests * linters * more linters * more linters * seeing how it works with parallel	2024-08-08 22:04:08 +03:00
gswangg	df44a4e861	Make vectorization of CONST explicit (#5322 ) * remove test_const_vectorize_fold * remove const folding UPat for VECTORIZE * refactor cstyle render_const * remove calls to dtype.scalar() in render_const * add assert * add vectorized const to UOp.const * add UPat GEP-VECTORIZE-CONST -> CONST * render_vectorize for DEFINE_ACC in cstyle * add back missing render_cast in render_const * generate vectorized consts as UOps for DEFINE_ACC * update asserts for DEFINE_ACC with VECTORIZE src * add UPats for PHI with VECTORIZE src * use prev rendered vectorize in DEFINE_ACC render * update DEFINE_ACC in python runtime * update vectorized DEFINE_ACC in PTXRenderer * rebase DEFINE_ACC changes on lowerer * verbose rewrite of bad UPats * simplify UOps.CONST implementation in ops_python * update sum_collapse UPats for DEFINE_ACC-VECTORIZE * revert linearizer to TOT * fix DEFINE_ACC implementation in ops_python * simplify DEFINE_ACC in cstyle * Fix linter error * support VECTORIZE in fold gated load/store UPat * support VECTORIZE in other fold gated load UPats * rewrite VECTORIZE in UPat for no input DEFINE_ACC * simplify DEFINE_ACC render in cstyle * make VECTORIZE rules more concise * add more vectorize fold tests * inline VECTORIZE-CONSTs in cstyle render * revert VECTORIZE/GEP rule refactor * revert cstyle render_const refactor * inline VECTORIZE-CONSTs in cstyle render * implicitly vectorized const rendering -> explicit * WMMA VECTORIZE CONST process replay hacks * VECTORIZE CONST NAN process_replay hacks * more VECTORIZE CONST NAN hacks * cleanup process_replay hacks * isnan() -> not isfinite() cstyle VECTORIZE CONST * tweak isnan and isfinite checks VECTORIZE CONST * tweak for positive vs negative infinity VECTORIZE CONST * add assert to PTX CONST render * process_replay VECTORIZE CONST render parity for PTX STORE * vmin/vmax for VECTORIZE'd CONST * update WMMA folding rules * add tests for WMMA VECTORIZE fold * hack for cstyle half4 CONST zero process_replay parity * revert PTX backend changes * add back minimal DEFINE_ACC PTX change * remove cstyle process_replay hacks * remove dead code in PTX CONST render * cleanup vmin/vmax logic for VECTORIZE'd CONSTs * update vectorize fold tests to use DEFINE_VAR * fix long line formatting in test * remove unwanted merge artifact * more vmin/vmax cleanup * remove unnecessary asserts * yet more vmin/vmax cleanup * get rid of explicit VECTORIZE CONST logic in _min_max * reuse CONST instead of creating a new one * remove unneeded cast * handle DType correctly in sconst * improve readability of tests * save a line * save another line * tuplize pats in src * remove GEP-VECTORIZE pats * add vec +0 fold * HACK: fold only vec8 +0 * remove vectorized ALU fold hack --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-08-08 20:59:05 +03:00

1 2 3 4 5 ...

2341 Commits