tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	c98ca23cb9	test pickle variable (#5150 ) * test pickle variable * fix process replay	2024-06-25 19:49:21 -07:00
David Hou	8fcc41582f	make buffer view optional with a flag (#5120 )	2024-06-25 19:13:20 -07:00
George Hotz	63ba2d05d1	uops dfs cleanup (#5147 ) * uops dfs cleanup * Update uops.py	2024-06-25 18:51:42 -07:00
George Hotz	6841ea3baf	don't allow duplicate variables (#5148 )	2024-06-25 18:47:29 -07:00
George Hotz	cc7fafcd8b	sink folding rule [run_process_replay] (#5145 )	2024-06-25 18:34:44 -07:00
Jhenner Tigreros	fa78755f19	Add new patterns to unfold division (#5139 ) * Add new patterns to unfold division * Create regression test and fix pattern	2024-06-25 18:07:47 -07:00
qazal	c4fdb9c725	second iteration on verify_lazyop (#5140 )	2024-06-25 09:44:32 +03:00
chenyu	dade7677cf	validate llama3 output only with model "LLaMA-3/8B-SF-DPO" (#5138 )	2024-06-24 20:58:25 -04:00
qazal	981afb114f	safely fold NEG in lazy.py (#5135 ) * safe * add test	2024-06-24 19:40:37 -04:00
chenyu	7948b05738	fix uneven shard with shrink and pad args on sharded axis (#5131 ) it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)	2024-06-24 16:55:50 -04:00
qazal	18e70deec3	verify_lazyop (#5124 ) * start verify_lazyop * bfs order * assert * assert shapetrackers 2 * refactor * more iteration * skips * that ast was wrong too	2024-06-24 13:45:35 -07:00
qazal	fe707bc968	hotfix: don't use is for comparing dtype (#5128 )	2024-06-24 14:12:34 -04:00
Jhenner Tigreros	dfa562dbc1	DEFINE_ACC takes UOps.CONST in vin instead of arg (#4975 ) * Change DEFINE_ACC to receive UOps.CONST in vin * Use localtype instead of acc dtype * Fix idp * Fix copy list * Fix warp * Fix error * Fix merge * Fix testing * Fix merge * Use deepcopy * Change to copy of inp * Fix lint * Move const to first place * Fix issue upat * Fix upat patterns * Change to list, to test permutations * Add condition * Change pm * Revert change pm * Remove unused rule * Fix * Change of float4 DEFINE_ACC values * Cast on PM to correct dtype * Improve assert message * Move IFs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-24 09:25:33 -07:00
nimlgen	d84beaa6dd	tiny profiler cleanups (#5126 )	2024-06-24 17:02:31 +03:00
chenyu	4a7d403777	cleanup test_multitensor (#5118 ) renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples	2024-06-23 20:54:22 -04:00
chenyu	c0ba5e0dfb	multi copy_to_device return the copy on same device if possible (#5117 ) previously it always returns from the first device	2024-06-23 20:25:56 -04:00
Francis Lam	b563cd52ed	linearizer: change globals to merge into left axis/gridDims.x first (#5033 ) * linearizer: change order of collapse to be left-most also fixes Variable max size to be correct and add docs for the off parameter * fix multiple global dim oversizes * add passing variable test and reorganize tests * use assert RuntimeError for failing test	2024-06-23 18:53:15 -04:00
nimlgen	69f116a7e1	nv/amd profiler (#4718 ) * nv/amd profiler * fix * fix * profile copies * profile logger * fixes * more fixes * less lines and fixes * fixes * some linter * back sync, no related change * fix gpu2cpu time def * simpler * linter * linter * docs * add add_event api	2024-06-23 17:10:12 +03:00
qazal	64a3b7931e	simplify render_ops ctx [run_process_replay] (#5116 ) * new ctx * delete DEFINE_VAR * lt isnt static	2024-06-23 16:56:32 +03:00
qazal	28bf8d86d8	test_linearizer with multi output ASTs (#5115 ) * ast is tuple * run test_phi_simplification * update reason * more tc * beam * a few more * use test_opt directly	2024-06-23 15:41:24 +03:00
chenyu	ee0c6dfc15	build Tensor._tri with movements only (#5110 ) * build Tensor._tri with movements only doesn't need arange, saved a kernel in attention mask * simpler, more tests	2024-06-23 00:07:36 -04:00
chenyu	20fabd8a5b	update Tensor.triu and Tensor.tril (#5109 ) renamed arg to `diagonal` that matches torch api, and added document and examples	2024-06-22 21:59:50 -04:00
chenyu	8f6ae84e4a	minor cleanup of conv_transpose2d (#5108 ) * minor cleanup of conv_transpose2d * that	2024-06-22 21:31:47 -04:00
chenyu	33211f356b	fix desc in tqdm (#5107 ) per doc `https://tqdm.github.io/docs/tqdm/`, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty. updated test cases and added a test for set_description	2024-06-22 19:00:38 -04:00
chenyu	055e616302	cleanup mnist data load in beautiful_mnist (#5106 )	2024-06-22 18:31:51 -04:00
chenyu	5516b790ad	hotfix append colon space to tqdm set_description (#5105 )	2024-06-22 18:09:14 -04:00
chenyu	e356807696	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
chenyu	8080298739	s/tinytqdm/tqdm (#5103 ) except in unit test where tqdm is imported	2024-06-22 14:18:26 -04:00
George Hotz	9f875123b6	small changes from lowerer. [run_process_replay] [no_assert] (#5102 )	2024-06-22 11:09:35 -07:00
chenyu	e468601226	update llama attention casting (#5096 ) * update llama attention casting updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention. * fix that	2024-06-22 10:57:17 -04:00
chenyu	ca021229e4	fix attention to always return in the same dtype as input (#5100 ) mid cast to default_float does not work as intended when default is float32 and qkv is in half	2024-06-22 10:34:57 -04:00
nimlgen	2dcef5a0d7	hcq spec (#5081 ) * hcq spec * small change * not used import * fixes * fix * signals into base class * more into base class * remove imports * fix wrap timeline * raise when not implemented * simpler	2024-06-22 15:32:12 +03:00
chenyu	8bd6cb9511	update llama model RMSNorm casting (#5095 ) following the original implementation, cast back to input dtype before multiplying weight. slightly faster https://github.com/meta-llama/llama/blob/main/llama/model.py	2024-06-21 23:02:04 -04:00
chenyu	0c857ae2d6	some onnx_ops cleanups (#5094 )	2024-06-21 22:01:32 -04:00
kormann	f4a041af16	Simplify graph_dedup [run_process_replay] (#5084 ) * reset master * remvpe double default	2024-06-21 22:12:30 +03:00
chenyu	00593d6095	clean the long lines in avg_pool2d and max_pool2d (#5091 )	2024-06-21 14:46:56 -04:00
chenyu	a971dc6218	argmax(axis=None) is argmax.flatten().argmax(0) (#5090 ) removed the alternative code path	2024-06-21 14:17:10 -04:00
chenyu	166a2b19b5	fix reduce axis of 0d tensors (#5089 ) `x.sum(())` is fine, and `x.sum((1,))` should throw IndexError	2024-06-21 13:51:40 -04:00
chenyu	3ff048b68c	type annotate reduce axis in tensor.py (#5088 )	2024-06-21 13:06:10 -04:00
chenyu	36b4a492a1	explicitly check getitem indices can have at most one ellipsis (#5087 ) * explicitly check getitem indices can have at most one ellipsis previous error with multiple `...`: ``` if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: index_type=<class 'ellipsis'> not supported ``` this pr: ``` if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: an index can only have a single ellipsis ('...') ``` * oh we have that already * test that * test these	2024-06-21 12:33:18 -04:00
nimlgen	f1e758bacb	graph fuzzer (#5082 ) * graph fuzzer * more options * mypy * no underscores for funcs	2024-06-21 18:47:23 +03:00
qazal	5717a54b28	don't use Tensor.empty in kernel opts tests (#5086 )	2024-06-21 18:41:03 +03:00
qazal	8aa786232d	docs for running process replay locally (#5083 )	2024-06-21 09:55:08 -04:00
nimlgen	fb1bf48cfe	io_uring for copies from disk (#5035 ) * exp uring * fixes and old version * nv * cleaner * cmp vs aio * fix * no lib * fix nv * linter * disk_speed_test now runs default * fixes * uring -> io_uring * linter happy * get_temp_buf comment added * tiny nits * put wait back * test runs everywhere * remove consts * remove mmap consts * do not require iouring to run test, they are generic	2024-06-21 11:36:51 +03:00
George Hotz	b69afc67d8	tinybox docs typo	2024-06-20 17:58:40 -07:00
George Hotz	6bc5e5f41c	start tinybox docs	2024-06-20 17:04:45 -07:00
chenyu	f6d6760f71	don't cast tuple to list before creating Tensor (#5071 ) Tensor constructor supports creating from tuple now	2024-06-20 13:32:56 -04:00
qazal	97f1347dd9	fix check_process_replay for special characters (#5072 ) * 'test' [run_process_replay] [no_assert] * test with ( ) { } '' " " * remove the log [run_process_replay] '' () { } '{ * helpful echos [run_process_replay] [no_assert] () '' * test [run_process_replay] [no_assert] * test2 [run_process_replay] [no_assert] * test3 [run_process_replay] [no_assert] * it's also correct this way [run_process_replay] [no_assert] * remove extras [run_process_replay]	2024-06-20 20:23:29 +03:00
George Hotz	6f6b3b10c9	import from uops, not linearizer (#5064 )	2024-06-20 08:08:44 -07:00
chenyu	50700171ef	minor cleanup to reshape arg handling (#5070 ) moved None handle to be with argfix, and only resolve -1 if there's a -1	2024-06-20 10:27:27 -04:00

1 2 3 4 5 ...

4837 Commits All Branches Search

4837 Commits

All Branches