tinygrad

Commit Graph

Author	SHA1	Message	Date
chenyu	2e087ca8e4	UOp bound for div negative number (#5808 )	2024-07-31 02:10:23 -04:00
qazal	bcbd925001	hcopts failing test for fused arange kernel (#5815 ) * add failure_43 * n 45	2024-07-31 09:02:44 +03:00
qazal	ed556c260e	UOps.IF rules more tests (#5831 ) * init tests * split tests * assert multiple gates simplicity	2024-07-31 00:11:02 -04:00
David Hou	492a696d14	allow specify splits in shard, handle multiple different splits in MLB.e (#5599 ) * allow specify splits in shard, handle multiple different splits in MLB.e * line width * linter * don't use Device in docstring * specify size of shards instead of boundaries * adjust docstring for specify size of shards instead of boundaries * don't allow splits on symbolic axis? * just allow sint in splits_to_bounds * add message for assert * bounds instead of splits to save lines * fix types * reduce diff * fix * tuple * golf :( --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-07-30 19:33:04 -07:00
chenyu	c3da458bc3	UOp if min==max folds to CONST (#5828 ) * UOp if min==max folds to CONST * fix test	2024-07-30 22:14:22 -04:00
George Hotz	e6879035a0	work to make GEMV fast (#5824 ) * work to make GEMV fast * half8 cast * align struct * fix amd * float8 is a later problem	2024-07-30 17:41:40 -07:00
chenyu	02f0be03f2	tests on UOp div negative number and arange opts (#5825 )	2024-07-30 20:06:57 -04:00
George Hotz	693990a346	swap src[2] and src[3] in load [run_process_replay] (#5821 ) * swap src[2] and src[3] in load [run_process_replay] * cleanups + bugfix * fix ptx	2024-07-30 14:04:13 -07:00
George Hotz	17a2f74412	new style load/store folder (#5784 ) * remove old index reorder * new style folder * works better * dedup * one failure * this is fine now... * expander_rewrite * images broken, but all else should work * cleanups * make tests work with old * fix images * cleanups + bugfix * minor fixes * fix gated store folding * flip gate_creator and expander * fix gated store * remove unneeded rules * lines getting close * line count good	2024-07-30 13:17:20 -07:00
qazal	03d866b84f	UOps.IF with rewrite rules (#5812 ) * expand merge * merge barriers * gate_folder * test_linearizer_failures * this can be here * bring the new repr back * gate_folder2 * gate_creator is better * gate_folder * dedup conditions * early gate folding * dedup barrier * fold noop conditions * all consts can go away * free lines	2024-07-30 20:50:56 +03:00
chenyu	defd89e8e0	unify negative shape creation to raise ValueError (#5817 ) [run_process_replay]	2024-07-30 13:42:59 -04:00
P4ssenger	6742a4789a	Add check for negative dimension in view (#5790 ) * add check for negative dimension in view * add negative dim tests * move check to tensor level * fix error message * move check to view create --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-30 13:26:27 -04:00
Francis Lata	ce61be16f1	clean up how preprocessed folder is defined (#5813 )	2024-07-30 12:35:26 -04:00
qazal	5e827e51d2	add llama3 BEAM=2 failures to test_linearizer_failures (#5553 ) * skips * opts.device * benchmarks * add to test_linearizer_failures * remove hardcoded ones * linter * skip cpu	2024-07-30 00:37:32 +03:00
samm393	573e0f9a48	remove float division from idiv in python_alu (#5777 ) * removes float division from idiv in python_alu * add test * cleaner logic * pass clang unsigned literals correctly * suffix ULL instead of U --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-29 12:14:12 -04:00
samm393	2c94316bd2	ull literal support and test (#5789 ) * ull literal support and test * missing .numpy()	2024-07-29 11:50:49 -04:00
nimlgen	ab3839a80a	cleanup nv/cuda compilers (#5767 ) * cleanup nv/cuda compilers * destroy prog * small test * fix test * nv ptx rewrite key * jitlink free * ptx is part of cuda	2024-07-29 13:50:03 +03:00
chenyu	e7a14f398e	more uop_symbolic tests for divmod pairs (#5785 )	2024-07-28 21:27:06 -04:00
George Hotz	76d191ab94	move consts to end of add (#5783 ) * move consts to end of add * better * fix infinite loop	2024-07-28 17:38:57 -07:00
chenyu	71a64d8252	UOps.MUL bound when one is negative (#5781 ) * UOps.MUL bound when one is negative also one more distribute_mul rule * don't always expand	2024-07-28 19:02:47 -04:00
qazal	b775db6b60	high-level benchmark timing diff (#5776 ) * high level timings benchmark times fix defs * use the name map * skip last task	2024-07-28 23:42:57 +03:00
chenyu	600a39771d	fix Tensor.arange if (stop-start) and step have different signs (#5775 )	2024-07-28 14:34:10 -04:00
David González Martínez	d0fd84e617	feat: allow passing gradient to .backward() to compute vjp (#5771 ) * feat: allow passing gradient to .backward() to compute vjp * fix * refactor * fix trailing whitespace	2024-07-28 11:13:18 -07:00
qazal	e0e7293b0a	make process replay unique in retries [run_process_replay] (#5773 )	2024-07-28 20:44:15 +03:00
qazal	95dda8dadf	more unmatching vectorize/gep asserts [run_process_replay] (#5760 ) * merge vectorize/gep rules [run_process_replay] * assert dtypes * src= * float2=(float4.x,float4.y)	2024-07-28 15:08:54 +08:00
chenyu	bfbd7c5461	more generic UOp mul mod folding (#5765 )	2024-07-27 20:20:35 -04:00
chenyu	80c6475757	update test_uop_symbolic to test UOp min and max (#5764 ) covers #5750, #5748, #5741	2024-07-27 19:53:21 -04:00
nimlgen	ed1d784077	test profiler timer sync across devs (#5751 ) * test profiler timer sync across devs * more correct * typo	2024-07-27 16:47:37 +03:00
qazal	3e49d86c01	process replay diffs 3 things now (#5731 ) * github api infra * process replay is 3 parts now * parse benchmarks * add gh_token * complete diff * move process replay tests * last successful run * add tempdir * skip master	2024-07-27 12:52:20 +03:00
qazal	57b4a8e98d	assert process replay asserts (#5737 ) * assert process replay asserts * one ci job is fine * test: Revert "separate process replay main loop (#5734)" This reverts commit `94d578396f`. * mac sed needs that * Revert "test: Revert "separate process replay main loop (#5734)"" This reverts commit e4ad7684d5472a64841a66b43bc1db7c9bbbf9e8. * disable process replay capture * save time * amd is tiny * send to /dev/null	2024-07-27 12:07:50 +03:00
George Hotz	f8972ace38	test flops (and allow wide ALU in UOps) [run_process_replay] (#5749 ) * flops test in external_test_speed_theoretical.py * test speed theo * min SZMAX * allow wide ALU for things that support it * needed for mypy	2024-07-26 21:07:28 -07:00
George Hotz	2fde2d2914	hotfix: external_test_speed_theoretical works on 24GB	2024-07-26 18:41:52 -07:00
George Hotz	829262a5ee	add external_test_speed_theoretical	2024-07-26 17:45:22 -07:00
kormann	a5ede535ef	NOp field name [run_process_replay] (#5742 ) * rm def name * add field name	2024-07-26 18:45:59 -04:00
George Hotz	c50e374bb6	multiple locals + get_kernel_modifier + fix valid (#5739 ) * multiple locals + get_kernel_modifier + fix valid * fix test pattern matcher	2024-07-26 15:10:10 -07:00
chenyu	dc7483ee6f	UOp simple div folding (#5740 ) made UOp.divides return the Optional[quotient] and used it for simple div folding	2024-07-26 17:14:32 -04:00
chenyu	671259417f	reuse UOp `__repr__` for NOp (#5738 )	2024-07-26 16:59:55 -04:00
kormann	b0c1dba299	named UOp class "NOP" [run_process_replay] (#5728 ) * NOP * fix const + simplify compile * rm VAR for NOOP --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-07-26 13:25:53 -07:00
George Hotz	4df46eac67	clean up tensor cores [run_process_replay] (#5736 ) * clean up tensor cores [run_process_replay] * remove tuple(wmma_sz), self.opts.device * remove tls, leave DEVICE	2024-07-26 13:21:23 -07:00
qazal	94d578396f	separate process replay main loop (#5734 ) * separate process replay main loop * [run_process_replay] * add kernel_changed * test with [run_process_replay] * revert temp [run_process_replay]	2024-07-26 21:43:08 +03:00
chenyu	a4e9ebc68a	update test_uop_symbolic (#5733 ) enabled more passed tests	2024-07-26 13:46:09 -04:00
chenyu	2cc55a3095	UOp simple mul add div fold (#5726 )	2024-07-25 22:00:30 -04:00
chenyu	5521b6d437	UOp simple mul-add-lt fold (#5721 )	2024-07-25 20:49:38 -04:00
qazal	1b53207b4f	revert isolated dags scheduling (#5724 )	2024-07-25 19:45:12 -04:00
chenyu	845b0d1c9d	UOp more generic div folding (#5722 ) old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c` new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`	2024-07-25 17:49:14 -04:00
chenyu	a82815262c	more test_pattern_matcher fixups (#5714 )	2024-07-25 14:12:21 -04:00
chenyu	05e02ddfb3	fixup test_pattern_matcher (#5712 )	2024-07-25 13:48:52 -04:00
qazal	9ceb3a3d1f	beautiful_mnist -4.3% kernels (#5709 ) * add is_complete * partially delete forced_realized * p2 * start * refactor to can_group * remove steps * _get_inputs is nicer * fix the cache * cache is dict now * rename to group	2024-07-25 20:30:49 +03:00
kormann	1e2eac755d	Fix repr upat (#5705 ) * test * fix * x fix * simpler * rm extra space	2024-07-25 12:05:48 -04:00
qazal	1c992de257	hotfix: compare_schedule defaults to false (#5707 )	2024-07-25 17:08:28 +03:00
qazal	489cda827a	more scheduler process replay tooling (#5706 ) * more scheduler process replay tooling * refactor to compare_schedule	2024-07-25 15:47:18 +03:00
qazal	4e070a2c89	start work on indexing fusion (#5590 ) * start base * the views add up base reduceop st: ShapeTracker(views=(View(shape=(60000, 1), strides=(1, 0), offset=0, mask=None, contiguous=True),)) top st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) merged buf.st+st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) * p1 * some cleanups * more cleanups * one kernel * more * late fuse arange * less lines * more work * fix st strides 1 * update test_schedule, start argmax * test_tiny_argmax * add FUSE_ARANGE * more cleanup * add utils * reduce merging * fix axis and fold if needed * more fusion * need to figure this out * now fixing all of these * todos+save a line * ready for p1	2024-07-25 13:23:38 +03:00
nimlgen	08f47d7dc3	more info on failure 41 (#5704 )	2024-07-25 12:14:28 +03:00
nimlgen	69d4f474d8	amd resnet pf (#5703 )	2024-07-25 11:21:22 +03:00
chenyu	46e1151c02	UOp more generic mul -> mod folding (#5698 )	2024-07-24 21:41:25 -04:00
chenyu	66a9c372af	UOp mod reduction (#5697 )	2024-07-24 20:36:00 -04:00
chenyu	8648fb2636	UOp vmin/vmax on ADD (#5689 )	2024-07-24 19:09:42 -04:00
chenyu	85710e86cb	UOps div folding (#5690 ) #5689, with just div folding and new test cases	2024-07-24 14:21:44 -04:00
chenyu	a7a77dfd83	UOp mul lt fold (#5677 )	2024-07-24 02:49:25 -04:00
chenyu	4e85761d40	UOp mod folding (#5668 )	2024-07-24 00:10:47 -04:00
George Hotz	053550c3f3	remove MERGE opt, cleanup wmma upcast (#5669 ) * remove MERGE opt, cleanup wmma upcast * upcast first * fix broken vectorize folding rule	2024-07-23 20:43:42 -07:00
chenyu	3060e0be4f	add vmin vmax of SPECIAL (#5670 ) * add vmin vmax of SPECIAL folded stuff like (-1 < gidx0) * flaky	2024-07-23 22:55:54 -04:00
George Hotz	fa14f7b4fd	switch contract arg to match expand arg [run_process_replay] (#5667 ) * switch contract arg to match expand arg [run_process_replay] * support multiaxis contract too, it's easy * cancel contract/expand	2024-07-23 18:08:33 -07:00
George Hotz	a85493bdbe	multiaxis contract test	2024-07-23 15:09:15 -07:00
George Hotz	e3f00ac77d	Fix cuda tc emu test (#5663 ) * fix acc folding for NV tensor cores * fix correctness of reduce_before_expand * fix test emulated CUDA tensor cores * test_gemm_fp16 on some devices	2024-07-23 15:04:25 -07:00
chenyu	16c27ae400	update UOp.SPECIAL arg spec [run_process_replay] (#5661 ) * update UOp.SPECIAL arg spec [run_process_replay] from `(0, "gid0", 4)` to just `("gid0", 4)`. closer to a Variable * fix ptx	2024-07-23 16:58:12 -04:00
chenyu	01fe00e055	skip test_failure_39 in CI (#5660 ) took more than 2 minutes in ci metal, it's basically the same as test_failure_37 but 20X bigger	2024-07-23 14:47:05 -04:00
chenyu	199b3bf02b	simple UOp lt/ge folding (#5657 ) works if lhs is a DEFINE_VAR. folds trivial x < -math.inf now, need to change SPECIAL to use DEFINE_VAR to fold more	2024-07-23 14:11:05 -04:00
qazal	b0fc5a4c6f	start scheduler process replay (#5656 )	2024-07-23 20:02:51 +03:00
chenyu	e210c87b4a	uop mod-mod simplification (#5650 )	2024-07-23 12:33:55 -04:00
nimlgen	1384f08cd4	hcq profile tests (#5654 ) * profile tests * fixes * remove linter	2024-07-23 18:40:33 +03:00
qazal	5f394fc9c6	more work toward non-blocking process replay (#5653 ) * non-blocking process replay * more actionable * test it * revert the test * %s/logging.warn/logging.warning	2024-07-23 14:26:31 +03:00
qazal	7cb67e6fb2	merge gated stores spec (#5652 ) * test_unmerged_ifs should merge ifs * test_tiny_gate_store * test_merge_ifs_alt * assert assert asserts	2024-07-23 18:53:27 +08:00
George Hotz	7c4b177e3a	add tests for uops stats (#5649 ) * add tests for uops stats * no locals skip is fine * eh	2024-07-22 21:57:03 -07:00
chenyu	4f83da626e	uop symbolic simple mul mod (#5648 )	2024-07-22 23:17:41 -04:00
chenyu	f2d2afdaa4	dumb linearizer example that max is not simplified (#5644 ) * dumb linearizer example that max is not simplified this might just get fix once basic mod simplification is done * need local	2024-07-22 18:37:26 -04:00
chenyu	24505199fb	UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] (#5642 )	2024-07-22 17:09:40 -04:00
chenyu	97b116bb1d	UOp mul div simplification (#5637 ) * UOp mul div simplification * != 0 is fine	2024-07-22 16:14:12 -04:00
nimlgen	26fc4610a0	amd more accurate cache managment (#5631 ) * amd more accurate cache managment * fix amd * add memory_barrier + copies tests * tranfer test as well * linter	2024-07-22 19:07:01 +03:00
Vyacheslav Pachkov	edc58e6b6e	hcq: remove duplicate allocation of kernel args by abstracting (#5633 )	2024-07-22 18:29:41 +03:00
George Hotz	dc21e63bd2	test: put conv in one reduce (#4441 ) * test: put conv in one reduce * put reduce at the end * more expand * generic, and that expand was breaking things * ratio * don't undo the expand * arg 1 * strides * warning, for resnet * warning removed * disable cast * handle cast * op * err, that's right * fixup * fix that * a test to play with * add double_reduces * working up to final reshape * fold the last reshape * moved to schedule * fix axis * ci, need to bring arange back * FUSE_CONV_BW maybe * valid in 3.9 * test_expand_reduce_is_folded_on_different_axes * add FUSE_CONV_BW=1 * test_fold_batchnorm_backward * test_sgd_4convs_fuse --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-07-22 12:16:13 +03:00
George Hotz	386fb5e7f8	folding without UNMUL (#5628 ) * folding without UNMUL * fix failures, index_collapse * import ReduceOps * test_arange_4096 isn't folding	2024-07-21 20:14:44 -07:00
George Hotz	7f5282b2f5	tests if the linearizer is generating dumb code (#5611 ) * tests if the linearizer is generating dumb code * push consts to the end * sort adds * sorted add and mul * this better * simple expand/contract * no math contract/expand	2024-07-20 20:36:32 -07:00
George Hotz	b399ccd6ef	BEAM bugfix, kernels dedup now (#5617 ) * BEAM bugfix, kernels dedup now * getenv is default	2024-07-20 19:43:50 -07:00
chenyu	92e7e65712	one more test case for symbolic mod mul (#5615 )	2024-07-20 17:23:06 -04:00
qazal	3ab5fe4e1b	test argmax multireduce failure (#5609 )	2024-07-20 21:33:03 +08:00
chenyu	b991097d41	move UPat and PatternMatcher from uopgraph.py to uops.py (#5597 ) * move UPat and PatternMatcher from uopgraph.py to uops.py towards instant UOps rewrite on UOp.alu [run_process_replay] * fix imports	2024-07-19 19:28:24 -04:00
George Hotz	2e617ca59e	lowerer img index (#5592 )	2024-07-19 14:22:02 -07:00
nimlgen	b1782e3fef	hcq refactor signal into class (#5575 ) * hcq refactor signal into class * fix amd * amd do not use amd_signal_t * cleanup * signal setter * fix linter * docs * more docs + types * fix types	2024-07-19 23:23:05 +03:00
George Hotz	d0ab20a5e5	careful memory counting (with tests to specify behavior) (#5587 )	2024-07-19 11:37:34 -07:00
chenyu	37dd233650	always reverse global dim (#5586 ) * always reverse global dim * one more test	2024-07-19 13:58:05 -04:00
George Hotz	10be05aae5	push contract through cast to fix test_float2_acc (try 2) (#5585 ) * push contract through cast to fix test_float2_acc (try 2) * contract push only on floats	2024-07-19 10:34:43 -07:00
George Hotz	51892c8fac	Revert "push contract through cast to fix test_float2_acc (#5581 )" (#5583 ) This reverts commit `ddda9420be`.	2024-07-19 09:44:30 -07:00
George Hotz	ddda9420be	push contract through cast to fix test_float2_acc (#5581 ) * push contract through cast to fix test_float2_acc * no_vectorized_alu applies to cast too	2024-07-19 09:30:26 -07:00
chenyu	3f590c3b31	some limit_dims to limit global merging (#5489 ) only supports merging dims in a way that does not surpass limit, no splitting yet	2024-07-19 12:17:46 -04:00
George Hotz	0ad87021e2	move acc to end (#5568 ) * move acc to end * confirmed pictures are the same * relax that * Update test_ops.py	2024-07-19 03:06:52 -07:00
George Hotz	2de82b8a5d	remove get_lazyop_info (#5570 ) * don't use get_lazyop_info more * keep that min * no ptx for that test	2024-07-19 03:05:33 -07:00
nimlgen	9d7edc9269	hcq rename HCQCompat -> HCQ (#5577 )	2024-07-19 11:34:17 +03:00
chenyu	2b2f8ad18c	failed example of float2 acc no long applies (#5573 ) * failed example of float2 acc no long applies * # noqa: E501	2024-07-19 02:40:04 -04:00
qazal	e7a057c20f	retire replay_schedule (#5563 )	2024-07-18 23:07:02 +03:00
qazal	50aba32ea8	hotfix: don't assert process replay in master. (#5562 ) This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert. if [run_process_replay] is green pre merge it's ok.	2024-07-18 22:05:00 +03:00
George Hotz	223d9283ee	fix float4 acc by moving contracts (#5559 )	2024-07-18 11:30:16 -07:00
chenyu	f5af98c450	failed test case that DEFINE_ACC no long uses float4 (#5555 ) * failed test case that DEFINE_ACC no long uses float4 * line	2024-07-18 10:55:59 -07:00
George Hotz	923e0fe0b8	fix half4 folding (#5556 )	2024-07-18 10:47:39 -07:00
chenyu	12e6771209	failed test case for unrolled half4 (#5552 )	2024-07-18 13:05:52 -04:00
George Hotz	d1a7279605	indexing fold with casted bool (#5551 ) * cast bool is where * universal transform is wrong	2024-07-18 10:02:29 -07:00
kormann	2c4add6844	pretty print lazy op per default (#5505 ) * pretty lop * min diff * walrus * fix * min diff * simplify * pretty helper function * ws * pretty uop upat * tests * stricter tests * test passes * ws * stronger upat test * delete print_tree * min diff * stricter exp test * fix merge * stronger uops eval test * +readable and deep upat test * +readable and deep upat test * sort inv fix * fix * revert allowed_len	2024-07-18 09:34:08 -07:00
qazal	0ad1672d5f	fuse indexing (LazyOp creation) (#5506 ) * bring FUSE_AS_ONE_KERNEL back * operands need reshape? * fused but arange didnt fold * something deeply wrong * yay, fused * derive broadcasts * s/input/reduce_input * _fixup_ones proved a point * this is what it takes * down to 3 required reshapes: 1. output_shape 2. the second reduce merge dims 3. remove dims for above reshape * start real reshapes * resolve shape in the edges pre lazyop * outputs are the same shape * rewrite1: just the reduce * more correct * fuse_as_one_kernel * closer * this passes * dont rerun info * dont need these * not needed	2024-07-18 14:09:17 +03:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
George Hotz	d3b098299d	add failing regression test for image (#5540 ) * add failing regression test for image * tg type * simpler test * don't realize image to image casts caused issue * simple pad	2024-07-17 17:27:18 -07:00
qazal	61ee02e93d	start multireduce lowerer work (var/std) (#5537 ) * multireduce no-opts works * passed test_var_multireduce * cleanup * double reduce * extra check for range_group * more checking for range_groups * cleaning up debug prints * cleanup diff * linters * revert kernel changes * these are uops toposort --------- Co-authored-by: timmy <timmy0x@proton.me>	2024-07-17 23:43:46 +03:00
Francis Lam	c4eb30a04c	test/test_linearizer_failures: add a new beautiful_mnist one (#5531 ) * test/test_linearizer_failures: add a new beautiful_mnist one this one is from a DEPTH=2 fuzz_linearizer search * add GPU to test_failure_40 --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-17 16:27:04 -04:00
qazal	0259d76183	use Context only in replaying Kernel [run_process_replay] (#5535 )	2024-07-18 03:46:14 +08:00
George Hotz	1a68854766	PatternMatcher add (#5532 ) * PatternMatcher add [run_process_replay] * f4 dynamic * test_failure_36 is fixed * fix PTX	2024-07-17 12:44:42 -07:00
qazal	a7706e05f9	option to [skip_process_replay] (#5533 )	2024-07-17 22:30:46 +03:00
George Hotz	1242b302fa	expand UOps with rewrite rules (#5501 ) * expand UOps with rewrite rules [run_process_replay] * progress * much closer * close, way less bugs * bunch of expander tests * fix contract * ops tests pass * fix barrier * mostly passing * bitcast in expanded ops * support more expand merges * all tests pass maybe * fix empty EXPAND * fix LIN fuzzing * add ALL_SAME assert * all same * all same work * raise CompileError * pass fuzz linearizer * revert whitespace * fix nv tensor core test * fix mypy * bug fix * fuzzer passes * put tests back * expand arg to idx	2024-07-17 10:17:50 -07:00
George Hotz	158221b36b	expand tests from uop_expander [run_process_replay] (#5524 ) * expand tests from uop_expander * more changes from the branch	2024-07-17 09:22:36 -07:00
George Hotz	42c25cc961	fix fixup_ast (#5523 ) * fix fixup_ast * these lin failures are fixed	2024-07-17 08:52:21 -07:00
nimlgen	dcd462860f	elf loader (#5508 ) * elf loader * cleanup * cleaner * cleaner * fixes * revert this * fix div 0 * fix nv * amd fix * fix mockgpu * amd better? * restore relocs for <12.4 * linter * this is fixed now * revert this * process cdefines as function * cleaner * align * save lines * revert this change	2024-07-17 17:09:34 +03:00
Francis Lam	2d53abb04a	test/external/fuzz_linearizer: fix for new AST changes (#5519 ) * test/external/fuzz_linearizer: fix for new AST changes also add beautiful_mnist failures * add CLANG and LLVM to test_failure_35 failed_platforms * fix test_linearizer_failure names	2024-07-17 00:08:07 -04:00
chenyu	6e405b0a2b	add 0d tensor to trunc/floor/ceil/round tests (#5512 ) existing trunc test passes backward but its backward is incorrect in general. added tests that would fail	2024-07-16 16:48:25 -04:00
Tobias Fischer	87a2ef2bc2	Add Interpolate Function (#5482 ) * add interpolate function * fixed linter issue * reduced sizes in test --------- Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-07-16 09:44:01 -07:00
qazal	173064c69c	(re)start multireduce in codegen/* (#5391 ) * test_var_multireduce * run verify_lazyop * test_var_multireduce * assert lazyop * add test_indexing_multireduce * arange fuses (crude) * note: extra reshape * start readble * test_arange_simple * test_arange_expanded * test_indexing_multireduce * cleanups * skip ptx * skip nv and amd ci * skip arange expanded too * GPU=1 is slow too in CI	2024-07-16 14:20:48 +03:00
chenyu	07ff4b7d24	test_failure_33 ast that has UOps.UNMUL after linearize (#5504 ) * test_failure_33 ast that has UOps.UNMUL after linearize * smaller	2024-07-15 22:54:23 -04:00
chenyu	63990705b5	test kernel opts case for 4 local and 4 groups (#5499 ) make sure local grouped dim is correct	2024-07-15 20:09:38 -04:00
Edward Wang	9a7d5a148e	move colorize_float to helpers.py (#5490 ) * add colorize_float to helpers.py * update references	2024-07-15 11:29:03 -07:00
qazal	ac08f0eb00	reshape rawbufs in test_linearizer (#5492 ) * reshape rawbufs in test_linearizer * fix helper_linearizer_ast	2024-07-15 19:14:38 +03:00
qazal	ae4cb7994e	run process replay with DEBUG=0 (#5491 ) * process replay with DEBUG=0 * graceful shutdown * use and	2024-07-15 16:30:57 +03:00
Tobias Fischer	e219103677	Add Pad to Pooling (#5488 )	2024-07-14 21:50:20 -07:00
Tobias Fischer	5849130cbb	gather negative dim fix (#5486 )	2024-07-14 20:20:53 -04:00
qazal	3c378efcb6	process replay docs improvements (#5481 ) * minor cleanups * docs and logs * shorter * comma * s/print/logging.info [run_process_replay] * use logging.warn * process name is noise * revert lowerer change [run_process_replay]	2024-07-15 00:09:28 +03:00
chenyu	613a1dbeed	render lidx starting with 0 (#5478 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-14 16:34:04 -04:00
qazal	671779f280	limit process replay diff to ~20% of kernels (#5480 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim * add changed * env var * more early exit * simpler? * Revert "Merge branch 'lidx0' into process_replay_limit" This reverts commit cbadcfa5e9b0489a2a9c1e0b6682db9f4f554ab8, reversing changes made to fc9bf37ee70392a4170036698ced14621783b625. * minor cleanup --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-14 23:10:08 +03:00
chenyu	f8a47608cc	test dtype.min and dtype.max (#5479 ) compared with np.iinfo for integer dtype	2024-07-14 15:31:37 -04:00
George Hotz	a9f5a764dc	make BatchNorm work for 2D and 3D (#5477 ) * make BatchNorm work for 2D and 3D * beautiful mnist shouldn't use BatchNorm2d	2024-07-14 11:39:58 -07:00
chenyu	e41ab66653	use is to compare types (#5476 ) new rule in latest ruff	2024-07-14 14:26:41 -04:00
nimlgen	61822d1a14	nv fix timeline signal rollover on copy queue (#5473 ) * hotfix: nv rollover to 32bits * test both queues	2024-07-14 16:06:12 +03:00
nimlgen	8835d6c49a	cleanup nv/amd program (#5449 ) * cleanup nv/amd program * fix amd * a bit cleaner * ugh, typo * linter * fix nv * tiny thing	2024-07-14 14:08:35 +03:00
qazal	0b3a34e3b1	vectorize folding [run_process_replay] (#5470 ) * test_gep_vec_fold * remove that * fix process replay * lint	2024-07-14 09:41:48 +03:00
chenyu	28972418c4	s/get_linearizer/get_kernel [run_process_replay] (#5467 )	2024-07-13 20:32:22 -04:00
Francis Lata	0345577032	UNet3D dataloader shared memory fix (#5465 ) * create separate SharedMemory between inputs and labels * update path check for shared mem * clean up unit test for dataset	2024-07-13 20:26:00 -04:00
George Hotz	942c58be90	BEAM_COMPARE=2 validates the correctness of BEAM kernels (#5458 ) * beam compare 2 * found issue maybe * correct, not fail * full rand * less numpy * extra simplify doesn't fix it * reorder * no numpy * check in reverse * test new tensor behavior * better error msg	2024-07-13 13:53:43 -07:00
qazal	487ceff825	hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist (#5456 )	2024-07-13 21:15:40 +03:00
qazal	40ec9410f9	simpler process replay (#5452 ) * remove check_process_replay * that can go to the top * add assert back * [run_process_replay] * checkout code [run_process_replay] * temp [run_process_replay] * revert temp [run_process_replay] * ahh this is why [run_process_replay] * revert temp [run_process_replay]	2024-07-13 19:55:06 +03:00
qazal	23b907efbb	restore process replay runs by their id (#5453 )	2024-07-13 19:32:34 +03:00
George Hotz	e638b0084f	smaller multitensor resnet test (#5450 ) * minor improvments to matcher speed [run_process_replay] * oh, put that back * make fake images smaller for resnet test	2024-07-13 07:31:28 -07:00
qazal	bb1a9ebf78	run process replay in parallel (#5443 )	2024-07-13 11:29:36 +03:00
chenyu	3ebf569f04	relax fuzz transend math threshold a bit (#5442 ) * relax fuzz transend math threshold a bit * fuzz more * fuzz 50k	2024-07-13 03:31:21 -04:00
chenyu	e398734890	fuzz test transcend math (#5383 ) * fuzz test transcend math found something wrong with float64 sin reduction ``` from tinygrad import Tensor, dtypes import numpy as np print(Tensor([39800.0], dtype=dtypes.float64).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float32).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float16).sin().numpy()) print(np.sin(np.array([39800.0], dtype=np.float64))) print(np.sin(np.array([39800.0], dtype=np.float32))) print(np.sin(np.array([39800.0], dtype=np.float16))) ``` ``` CLANG=1 python test.py [0.92785633] [0.7428573] [-0.7705] [0.74285722] [0.7428572] [-0.7705] ``` * fix test * abs * skip	2024-07-13 01:54:52 -04:00
hikettei	3a7262d923	[Patch] Fixed an invaild value of fp64 xlog(DBL_MIN) (#5441 ) * [Patch] Removed weird NaN Handling in xlog2 resulting in different output around 1e-203 * Patch: compare the value of xlog(x) using y, allowing x <= 1e-200 * mypy * fuzzer tests for log2 * fix tests: use approximate dbl_min, fp64 fails at nv * update: gradually increment the scale (if y is not inf)	2024-07-13 01:11:53 -04:00

1 2 3 4 5 ...

2309 Commits