tinygrad

Commit Graph

Author	SHA1	Message	Date
chenyu	fbaab30fe3	add timing to fuzz_linearizer (#7056 ) and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI	2024-10-14 11:57:41 -04:00
chenyu	0d2462cbdf	use more resolve in View merge add [pr] (#7055 )	2024-10-14 11:31:13 -04:00
qazal	8428244c30	gates are always bool [pr] (#7054 )	2024-10-14 17:55:08 +03:00
qazal	7a28d50320	small st_fixup changes [pr] (#7053 )	2024-10-14 16:53:10 +03:00
qazal	0ef186d4be	scheduler internal api cleanups [pr] (#7052 ) * delete external_benchmark_ast.py [pr] * cleanup 2 * random	2024-10-14 15:56:10 +03:00
qazal	bc95b7e422	actually use UOps.CONTIGUOUS (#7049 )	2024-10-14 15:11:23 +03:00
George Hotz	f85c9ba00a	rewrite max to use cmplt + where (#7037 )	2024-10-14 20:00:51 +08:00
qazal	88ce6ec69a	ASSIGN is always (target, val) (#7048 )	2024-10-14 14:47:52 +03:00
qazal	0f71bc10cd	small changes from the lazy_pm branch [pr] (#7047 )	2024-10-14 12:21:21 +03:00
qazal	3e795f2e52	verify_ast changes from lazy_pm [pr] (#7045 )	2024-10-14 12:08:18 +03:00
George Hotz	b20b22a738	hotfix: add test_tiny, because many times it's what you want	2024-10-14 16:32:33 +08:00
George Hotz	c4db927c7b	touchup lowerer [pr] (#7043 )	2024-10-14 16:13:28 +08:00
Louis Novy	2ac5aec66b	Fix exponential complexity in _is_padding_okay [pr] (#7008 ) * preliminary test * missed Optional * don't check for cache during recursion * match style from st_fixup... may be marginally faster? * pathological test case: strongly connected DAG * move to test_schedule as this isn't really a fusion * oops this shouldn't be edited * Revert "oops this shouldn't be edited" This reverts commit 487cb027dc5120542755446d1595ec7b76c207e8. * Revert "move to test_schedule as this isn't really a fusion" This reverts commit 48d8c550ce84453e6fc0306e1c6c448fe1286f79. * move to test_schedule as this isn't really a fusion * ok no more merge error funny business	2024-10-14 02:34:47 +03:00
chenyu	bd8ecf7fd6	remove NumNode (#7035 )	2024-10-13 16:42:19 -04:00
chenyu	c4c806a210	generate new kernel dataset (#7034 ) * generate new kernel dataset pre req to remove NumNode ``` extra/optimization/generate_dataset.sh gzip -k /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix var range in fuzz_linearizer	2024-10-13 16:19:41 -04:00
chenyu	1a27417262	remove arbitrary multiplication case (#7033 ) adds the wrongly simplified kernel in test_linearizer_failures #7019	2024-10-13 15:06:05 -04:00
chenyu	13575f080a	remove bitcast backward in function.py (#7031 ) bitcast cannot backward	2024-10-13 10:08:27 -04:00
Harsh Natuskar	ace834ef7b	=docs update (#7027 )	2024-10-13 19:39:06 +08:00
qazal	13846930cd	hotfix: extract_dataset.py (#7029 )	2024-10-13 11:18:23 +03:00
nimlgen	942a17109a	qcom use QCOMBuffer for all allocated buffers (#7023 ) * qcom use QCOMBuffer for all allocated buffers * checks	2024-10-12 23:44:36 +03:00
chenyu	04d9b46d51	derivative of softmax is indepedent of max (#7009 ) * derivative of softmax is indepedent of max * update test	2024-10-12 15:59:23 -04:00
chenyu	cae1c41755	test case of softmax backward kernel count (#7022 )	2024-10-12 15:46:32 -04:00
George Hotz	5ce224ceb3	handle arbitrary multiplication case (#7019 ) * handle arbitrary multiplication case * remove count restriction	2024-10-12 23:16:27 +08:00
chenyu	23faeacb23	remove outdated comments (#7018 )	2024-10-12 10:51:07 -04:00
George Hotz	85a45164fb	remove pyint [pr] (#7016 ) * remove pyint * bump time on tp [pr] * dont truncate in const fold * remove dead code * Revert "dont truncate in const fold" This reverts commit 29c81db0f7880848b001c2728aa555a1ef17e7d3. * remove define_var	2024-10-12 22:36:24 +08:00
George Hotz	38d45dfba5	hotfix: no rng in test/external/external_benchmark_schedule.py	2024-10-12 22:03:04 +08:00
chenyu	ed1ed9e4ff	bert use BS=72 (#7015 ) memory 131 -> 138 green tflops 201 -> 209 red tflops 160 -> 169	2024-10-12 09:41:56 -04:00
George Hotz	cba4b9a058	clean up ops file [pr] (#7013 )	2024-10-12 19:53:52 +08:00
qazal	746a1f8c86	prep uoping diff for big graph [pr] (#7014 )	2024-10-12 14:09:32 +03:00
ignaciosica	334f499e6a	consistent render of recip in cuda with CStyleLanguage (#6980 )	2024-10-12 18:56:47 +08:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00
George Hotz	16271189ea	hotfix: don't spend lines on a (broken) favicon	2024-10-12 18:21:10 +08:00
George Hotz	b737ee5bac	move to_indexed_uops to uops (#7011 ) * move to_indexed_uops to uops * UOp.range	2024-10-12 18:20:57 +08:00
George Hotz	5ae2de9845	UOp.variable (#7010 ) * UOp.variable [pr] * fix tests * clean * improve name rendering * last bug	2024-10-12 18:20:44 +08:00
Bhavya Gada	f79e05cac0	add types in all nn/init.py classes (#7002 ) * add types in batchnorm class * fix lint error in batchnorm types * add types to conv1d function * add types to convtranspose1d func and conv2d, convtranspose2d classes * add types to all remaining classes * change conv1d padding type to also accept str * less is more; only keep non-obvious types * mkdocs need types	2024-10-12 14:42:14 +08:00
ignaciosica	2bb6b95e9f	refactor _make_hip_code_for_op into pm rules (#7001 )	2024-10-12 12:46:22 +08:00
George Hotz	5c9f76e274	hotfix: openpilot compile3 compare to i==1	2024-10-12 09:44:24 +08:00
chenyu	36056e0760	update mlperf systems and copy 4.1 to 5.0 (#7004 )	2024-10-11 16:20:34 -04:00
Markiian Novosad	8831c691e2	Add slice parameter type checking to disallow Tensor usage for slices (#6967 ) * add support for single el tensors for slices * rm trailing spaces * cleanup long lines * remove tensor in slice support, add comprehensive err msg * cleanup getitem, add slice type check * Edit err message	2024-10-11 16:20:21 -04:00
Francis Lam	b0dd407cdd	ops_cuda: add optional dynamic smem parameter (#6956 ) * ops_cuda: add optional dynamic smem parameter This is required to enable larger than 48kb shared memory usage on a per-kernel basis. * move setting max dynamic smem size to init	2024-10-11 21:51:06 +03:00
chenyu	0e42662f2a	log seed at the right place for bert (#7000 )	2024-10-11 10:39:40 -04:00
nimlgen	5496a36536	update red mlperf bert readme (#6969 )	2024-10-11 13:08:06 +03:00
nimlgen	feb0bcb58b	qcom bench bind to perf cluster (#6996 )	2024-10-11 12:21:52 +03:00
qazal	7451812bbf	delete AST_REWRITE ctx var (#6995 )	2024-10-11 11:33:16 +03:00
qazal	7988547df2	start changes from big graph (#6993 ) * start changes from big graph [pr] * space * still capture ctx	2024-10-11 11:13:46 +03:00
George Hotz	e7a0ffe46a	break out linearization [pr] (#6994 )	2024-10-11 15:27:33 +08:00
George Hotz	f319530191	don't track simplify [pr] (#6992 )	2024-10-11 15:03:03 +08:00
George Hotz	e441794c4b	remove custom op support, we waste time maintaining this (#6991 ) * remove custom op support, we waste time maintaining this * customop is over	2024-10-11 14:31:09 +08:00
George Hotz	c08521e823	minor cleanups from toonygrad (#6990 )	2024-10-11 14:19:10 +08:00
George Hotz	f50d0e0ee0	cloud device [pr] (#6964 ) * first try at cloud device [pr] * real separation * we're free * clang works * unhappy with timeout * better timeouts and free * unrelated * use http verbs + add test * lines + better test * fix DELETE * shorter cloud * split key * fix sending renderer * PTXRenderer serialization * add sessions * http.client * minor timeout bump * fix keep-alive * inc server timeout * real fix timeout * that one too	2024-10-11 12:24:06 +08:00

1 2 3 4 5 ...

6331 Commits All Branches Search

6331 Commits

All Branches