tinygrad

Commit Graph

Author	SHA1	Message	Date
qazal	9295bc0189	viz more work [run_process_replay] (#6568 ) * infra * found it * real work * bring those back * cleanup test_viz * comment that out	2024-09-17 19:27:09 +08:00
George Hotz	904f6a63fa	Revert "Revert "cleanup process_replay/* namings [run_process_replay] (#6429 )…" (#6442 ) This reverts commit `eda177da84`.	2024-09-10 07:00:16 +08:00
George Hotz	dbd4536167	Revert "add UOps.VALID (#6387 )" (#6441 ) This reverts commit `8186e4e7d6`.	2024-09-09 21:33:00 +08:00
George Hotz	eda177da84	Revert "cleanup process_replay/* namings [run_process_replay] (#6429 )" (#6437 ) This reverts commit `f4e83b30b4`.	2024-09-09 18:52:36 +08:00
qazal	f4e83b30b4	cleanup process_replay/* namings [run_process_replay] (#6429 )	2024-09-09 16:59:04 +08:00
George Hotz	8186e4e7d6	add UOps.VALID (#6387 ) * uops valid * broke full_shape * fixup that st (hardcoded asts still red) * fixup DEFINE_VAR debug more debug * start moving stuff to ast_const * move test_linearizer * move test_linearizer_failures to ast_const * fixup test_schedule * small diff change * regenerate dataset * fixup test_multitensor * regen dataset try 2 --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-09 16:58:43 +08:00
qazal	c5bae55ec8	new generate_dataset.sh (#6423 ) * new generate_dataset.sh * keep those there * test: rm expected failures * rename to extract	2024-09-09 15:13:07 +08:00
gswangg	94a72d44d2	update CI tests in extra with UOp AST (#6290 )	2024-08-28 22:26:50 +03:00
chenyu	e745e16441	remove UnaryOps.NEG (#6238 ) * Remove UnaryOps.NEG generated new dataset with ``` time JIT=2 PYTHONPATH=. ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix that	2024-08-22 14:21:39 -04:00
George Hotz	74ee9febec	remove iter from uopgraph (#6110 ) * remove iter from uopgraph * linearize returns uops * fix tests * linearize in linearize * tests fix * touchup * test failures	2024-08-16 15:58:29 -07:00
qazal	c23d44c779	AST is UOp (#6030 ) * most of the work from the uops2 branch * schedule * realize * kernel * lowerer * search * green * merge uops with ops * Revert "merge uops with ops" This reverts commit 1408a59f12c97e3466679884266b247cf9df46bc. * fix benchmark * remove extra dedup	2024-08-16 22:09:00 +03:00
chenyu	e6c7c3e499	update pylint path to check indent/space for all (#6022 ) also fixed many errors. it was not checking nested dirs. exclude autogen for now. can we use ruff for this?	2024-08-10 14:41:09 -04:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
Francis Lam	2d53abb04a	test/external/fuzz_linearizer: fix for new AST changes (#5519 ) * test/external/fuzz_linearizer: fix for new AST changes also add beautiful_mnist failures * add CLANG and LLVM to test_failure_35 failed_platforms * fix test_linearizer_failure names	2024-07-17 00:08:07 -04:00
chenyu	28972418c4	s/get_linearizer/get_kernel [run_process_replay] (#5467 )	2024-07-13 20:32:22 -04:00
George Hotz	03c2dc8bd7	lowerer is kernel [run_process_replay] (#5437 )	2024-07-12 18:50:55 -07:00
George Hotz	870dc8c350	s/Linearizer/Lowerer [run_process_replay] (#5428 )	2024-07-12 15:54:07 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
George Hotz	94599c0637	fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424 ) * fixup ast in kernel to be MetaOps.SINK [run_process_replay] * fix tests * fix more tests	2024-07-12 14:01:03 -07:00
George Hotz	c13da83f12	tests from lowerer branch (#5339 ) * tests from lowerer branch * Update test_image_dtype.py * Update test_image_dtype.py * Update test_image_dtype.py	2024-07-08 21:23:19 -07:00
chenyu	8080298739	s/tinytqdm/tqdm (#5103 ) except in unit test where tqdm is imported	2024-06-22 14:18:26 -04:00
chenyu	a3ed4176c8	use tinytqdm in active tests and examples (#5038 ) * use tinytqdm in active tests and examples stress test this before 0.9.1 * no set_description	2024-06-18 16:01:19 -04:00
Jhenner Tigreros	dc9e9e4363	Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887 ) * Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV * Delete unused import * Add cstyle renderer * Fix formatting text * Fix test error due to bad implementation of renderer * Add PTX support * Add RECIP to LLVMIR * Remove BinaryOps.DIV from symbolic test * Change some test and fix C floor division * Change references to DIV for the RECIP or IDIV * Add mimic idiv for symbolic test * Restore floor * Mimic idiv * cast to int * Fix some test and renderer * Remove DIV for render nodes * Resolve issue with div * Add TestRenderer * Fix test * fix error * Fix PAD test * Fix div implementation * Remove DIV * Add upcast to rshift, due to use of MUL and RECIP on DIV * Fix linter * Remove complete BinaryOps.DIV * Fix lint * Fix some test * Revert mul modification * Fix tests * Fix CLANG for uops * Revert IDIV function * Minor fix * modify pattern matching rule to support nan * Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP * Remove const folding for IDIV and fix PTX * Complete remove IDIV from extra * Remove test_div from TestFloatUOps due to test on recip * Fix linearizer * fix * Fix test_22 * Fix llvm * Apply trunc function for llvmlit * use floor instead of trunc * Use correct type * Generate new fuzz db * Fix rshift, do not cast to float to support idiv * Return upcast=false to rshift * Add to unsafepad BinaryOps.IDIV * Remove RECIP override for CUDA * add atol / rtol for the test * Remove cast to int on IDIV * Regenerate sops * delete sops.gz * regenerate * regenerate * regenerate * Reduce margins * pass atol and rtol as parametersg for _test_metrics * regenerated dataset * Regenerate * Remove duplicated * Revert changes on extra * Remove changes extra and NOQA for test * Remove E501 * Remove and change line * Remove E501 * Fix atan2 * Revert import and E501 * Remove E501 * Add hrcp to halp ops * Remove 1 of hrcp * Remove last DIV and add type check on uops for IDIV * Fix new tests * Fix tests and custom function * Regenerate dataset * Regenerate dataset * Revert dataset * Change generate dataset script * Remove line * Change IDIV, type checker validate if x,y and z are int --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-14 02:43:46 -07:00
George Hotz	ff64bcab69	move graph/search to engine (#4596 )	2024-05-14 23:12:59 -07:00
George Hotz	fd02ab1e8b	move disassemblers and openpilot (#4592 ) * move disassemblers and openpilot * delete junk * put that in pre-commit * fixup readme	2024-05-14 19:30:02 -07:00
George Hotz	347a3acb37	add renderer class (#4524 ) * add renderer class * tests pass * fix pylint * fix tensor cores	2024-05-10 21:40:02 -07:00
Francis Lam	c8595a9655	update sops.gz, fix tests and add new linearizer test (#4437 ) * update sops.gz, fix tests and add new linearizer test * remove METAL CI skip for test_failure_22 * re-add skip to METAL CI to test_failure_22	2024-05-05 17:31:25 -04:00
Francis Lam	46850a0269	search: add a BEAM_COMPARE env to optionally not compare to hc/tc (#4107 ) * search: add a BEAM_COMPARE env to optionally not compare to hc/tc setting BEAM_COMPARE=0 will prevent additional memory allocation needed to do the timing tests assuming the BEAM result is in the diskcache. * change to optionally use Buffer.allocate	2024-04-08 18:54:01 -04:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00
Francis Lam	5587594a00	fuzz_linearizer: add --ast and --file params to read kernels (#3877 ) also fix up ast_str_to_str to support the new tuple of LazyOps	2024-03-22 14:27:40 -04:00
Francis Lam	6d5dec2fef	log optimized kernels and a script to compare with non-optimized ones (#3829 ) * search: add BEAM_VERIFY option to validate search results refactor fuzz_linearizer comparison to allow it to be used in for BEAM_VERIFY in device.py * search: fix to verify the beam_search result and not the fastest * search: fix typing and clean up * device: remove imports from test and add LOGKERN options LOGKERN output can be used with test/external/verify_kernel.py to validate correctness * fix example in verify_kernel.py * cleanup fixes * fix to use f-strings	2024-03-20 19:22:08 -04:00
chenyu	47b9cc2dfe	use float32 for rand buffer in test_beam_search and test in metal (#3831 )	2024-03-19 23:22:58 -04:00
wozeparrot	a0ab755317	threefry again (#3785 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| * feat: restore old --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-18 16:47:07 -04:00
George Hotz	311cf2b7d3	Revert "threefry_2x32 (#2601 )" (#3784 ) This reverts commit `db3de54bc4`.	2024-03-17 10:27:20 -07:00
wozeparrot	db3de54bc4	threefry_2x32 (#2601 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-17 10:19:33 -07:00
qazal	e3e89c244b	multioutput uoping infra (#3706 ) * linearize multioutput * add vars to copy	2024-03-15 21:56:59 -07:00
qazal	aec4c4f01b	linearizer ast as a tuple of lazyops (#3689 ) * multi store op linearizer * currently we do only one output per kernel * named opts	2024-03-11 15:39:04 -07:00
Francis Lam	3219a527d6	search: add a tool that beam searches one or more kernels (#3685 )	2024-03-11 16:02:17 -04:00
George Hotz	ac02e7347d	ptx timing vs cuda timing (#3659 )	2024-03-08 10:17:49 -08:00
chenyu	a66ffec6d3	update kernel dataset to exclude the disktensor ones (#3651 ) disk tensor load contains big offset and is not meant to be run by gpu. repro steps ``` time ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ```	2024-03-07 17:35:19 -05:00
chenyu	35d998efa8	disable flaky test_conv_beam in CI (#3553 ) might fail due to CL_OUT_OF_RESOURCES	2024-02-29 22:59:41 -05:00
wozeparrot	da32c37346	use hash as key for beam (#3516 ) * feat: use hash as key for beam * feat: bump db version	2024-02-28 10:19:01 -08:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	a8ba1ac08f	track size in shapetracker (#3026 ) * track size in shapetracker * shapetracker adapter * size is an int * create Buffer with st.size * only compare the views for the jit * fix webgpu	2024-01-05 20:15:53 -08:00
chenyu	b1d9e54ea3	regenerate kernel ast dataset (#2968 ) added back the log ast function and removed hacks that work around the old dataset	2024-01-01 20:26:17 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
chenyu	820f2e054e	fix PADTO optimization (#2935 ) the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops. even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.	2023-12-25 22:52:49 -05:00
qazal	12996d3a7d	green linearizer asserts for ops (#2800 ) * these asserts should pass * fix that assert * ALU dtypes * acc dtype for group_for_reduce * cast image ALUs to the base dtype * remove all casts from linearizer * fix argmax * fix multinomial * fix __getitem__ * Revert "fix __getitem__" This reverts commit 62ad719bfa5a2e1fcbfa931360f54897f8977602. * fix MemBuffer outputs being wrong when there is an arange + ALU with a different dtype eg. fancy slicing (int, float), bert embeddings (int, long) this should be fixed in lazy instead of having to break the kernel * cleanup argmax fix * fix matmul in ints cast in the end * fix llama * skip wrong hardcoded asts in the worlds dataset * fix llama p2 * cleanup missing parts of the diff --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-25 10:41:54 -05:00
chenyu	765f8b05e5	TernaryOps.WHERE has vin[0] as bool and BinaryOps.CMPLT always outputs bool (#2782 ) * vin[0] to where is always bool * due to better hack * update test * fix test_uops	2023-12-15 14:51:51 -05:00
chenyu	7fec966b5e	bye bye NOOP (#2534 ) * bye bye NOOP * SIN * NEG	2023-11-30 23:10:35 -08:00

1 2

74 Commits