tinygrad

Commit Graph

Author	SHA1	Message	Date
hikettei	0f0c3934b1	refactor: improved the consistency of the frexp in transcendental (#7060 ) * clarify the intetntion of bias * Improved the consistency of m2 * int16 --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-15 10:18:38 -04:00
chenyu	d12c87dc8e	use ubuntu-22.04 in CI (#7068 ) ubuntu-latest points to 24.04 now, maybe it's this?	2024-10-15 09:44:59 -04:00
nimlgen	586ff4c910	nv record uvm mappings (#7059 ) * nv record uvm mappings * linteeer * smth * ooops	2024-10-15 00:12:49 +03:00
chenyu	2008bac6bf	use validhack logic to rewrite buffer idx (#6740 ) * use validhack logic to rewrite buffer idx saved a whopping one mod in the conv backward kernel... * cleanup more	2024-10-14 16:47:31 -04:00
qazal	968a79b56c	lint viz with eslint (#6988 ) * lint viz * green * move config * space * meh, laterg	2024-10-14 22:40:56 +03:00
chenyu	a99e42cf2f	clean up test_uop_symbolic.py (#7058 ) enable more tests and remove dead tests	2024-10-14 15:35:58 -04:00
nimlgen	8094340221	nv print info about faults (#7057 ) * nv print info about faults * unrelated changes * nv_gpu.GT200_DEBUGGER in mockgpu * regen with ocrrect version * spacing	2024-10-14 21:49:38 +03:00
chenyu	fbaab30fe3	add timing to fuzz_linearizer (#7056 ) and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI	2024-10-14 11:57:41 -04:00
chenyu	0d2462cbdf	use more resolve in View merge add [pr] (#7055 )	2024-10-14 11:31:13 -04:00
qazal	8428244c30	gates are always bool [pr] (#7054 )	2024-10-14 17:55:08 +03:00
qazal	7a28d50320	small st_fixup changes [pr] (#7053 )	2024-10-14 16:53:10 +03:00
qazal	0ef186d4be	scheduler internal api cleanups [pr] (#7052 ) * delete external_benchmark_ast.py [pr] * cleanup 2 * random	2024-10-14 15:56:10 +03:00
qazal	bc95b7e422	actually use UOps.CONTIGUOUS (#7049 )	2024-10-14 15:11:23 +03:00
George Hotz	f85c9ba00a	rewrite max to use cmplt + where (#7037 )	2024-10-14 20:00:51 +08:00
qazal	88ce6ec69a	ASSIGN is always (target, val) (#7048 )	2024-10-14 14:47:52 +03:00
qazal	0f71bc10cd	small changes from the lazy_pm branch [pr] (#7047 )	2024-10-14 12:21:21 +03:00
qazal	3e795f2e52	verify_ast changes from lazy_pm [pr] (#7045 )	2024-10-14 12:08:18 +03:00
George Hotz	b20b22a738	hotfix: add test_tiny, because many times it's what you want	2024-10-14 16:32:33 +08:00
George Hotz	c4db927c7b	touchup lowerer [pr] (#7043 )	2024-10-14 16:13:28 +08:00
Louis Novy	2ac5aec66b	Fix exponential complexity in _is_padding_okay [pr] (#7008 ) * preliminary test * missed Optional * don't check for cache during recursion * match style from st_fixup... may be marginally faster? * pathological test case: strongly connected DAG * move to test_schedule as this isn't really a fusion * oops this shouldn't be edited * Revert "oops this shouldn't be edited" This reverts commit 487cb027dc5120542755446d1595ec7b76c207e8. * Revert "move to test_schedule as this isn't really a fusion" This reverts commit 48d8c550ce84453e6fc0306e1c6c448fe1286f79. * move to test_schedule as this isn't really a fusion * ok no more merge error funny business	2024-10-14 02:34:47 +03:00
chenyu	bd8ecf7fd6	remove NumNode (#7035 )	2024-10-13 16:42:19 -04:00
chenyu	c4c806a210	generate new kernel dataset (#7034 ) * generate new kernel dataset pre req to remove NumNode ``` extra/optimization/generate_dataset.sh gzip -k /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix var range in fuzz_linearizer	2024-10-13 16:19:41 -04:00
chenyu	1a27417262	remove arbitrary multiplication case (#7033 ) adds the wrongly simplified kernel in test_linearizer_failures #7019	2024-10-13 15:06:05 -04:00
chenyu	13575f080a	remove bitcast backward in function.py (#7031 ) bitcast cannot backward	2024-10-13 10:08:27 -04:00
Harsh Natuskar	ace834ef7b	=docs update (#7027 )	2024-10-13 19:39:06 +08:00
qazal	13846930cd	hotfix: extract_dataset.py (#7029 )	2024-10-13 11:18:23 +03:00
nimlgen	942a17109a	qcom use QCOMBuffer for all allocated buffers (#7023 ) * qcom use QCOMBuffer for all allocated buffers * checks	2024-10-12 23:44:36 +03:00
chenyu	04d9b46d51	derivative of softmax is indepedent of max (#7009 ) * derivative of softmax is indepedent of max * update test	2024-10-12 15:59:23 -04:00
chenyu	cae1c41755	test case of softmax backward kernel count (#7022 )	2024-10-12 15:46:32 -04:00
George Hotz	5ce224ceb3	handle arbitrary multiplication case (#7019 ) * handle arbitrary multiplication case * remove count restriction	2024-10-12 23:16:27 +08:00
chenyu	23faeacb23	remove outdated comments (#7018 )	2024-10-12 10:51:07 -04:00
George Hotz	85a45164fb	remove pyint [pr] (#7016 ) * remove pyint * bump time on tp [pr] * dont truncate in const fold * remove dead code * Revert "dont truncate in const fold" This reverts commit 29c81db0f7880848b001c2728aa555a1ef17e7d3. * remove define_var	2024-10-12 22:36:24 +08:00
George Hotz	38d45dfba5	hotfix: no rng in test/external/external_benchmark_schedule.py	2024-10-12 22:03:04 +08:00
chenyu	ed1ed9e4ff	bert use BS=72 (#7015 ) memory 131 -> 138 green tflops 201 -> 209 red tflops 160 -> 169	2024-10-12 09:41:56 -04:00
George Hotz	cba4b9a058	clean up ops file [pr] (#7013 )	2024-10-12 19:53:52 +08:00
qazal	746a1f8c86	prep uoping diff for big graph [pr] (#7014 )	2024-10-12 14:09:32 +03:00
ignaciosica	334f499e6a	consistent render of recip in cuda with CStyleLanguage (#6980 )	2024-10-12 18:56:47 +08:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00
George Hotz	16271189ea	hotfix: don't spend lines on a (broken) favicon	2024-10-12 18:21:10 +08:00
George Hotz	b737ee5bac	move to_indexed_uops to uops (#7011 ) * move to_indexed_uops to uops * UOp.range	2024-10-12 18:20:57 +08:00
George Hotz	5ae2de9845	UOp.variable (#7010 ) * UOp.variable [pr] * fix tests * clean * improve name rendering * last bug	2024-10-12 18:20:44 +08:00
Bhavya Gada	f79e05cac0	add types in all nn/init.py classes (#7002 ) * add types in batchnorm class * fix lint error in batchnorm types * add types to conv1d function * add types to convtranspose1d func and conv2d, convtranspose2d classes * add types to all remaining classes * change conv1d padding type to also accept str * less is more; only keep non-obvious types * mkdocs need types	2024-10-12 14:42:14 +08:00
ignaciosica	2bb6b95e9f	refactor _make_hip_code_for_op into pm rules (#7001 )	2024-10-12 12:46:22 +08:00
George Hotz	5c9f76e274	hotfix: openpilot compile3 compare to i==1	2024-10-12 09:44:24 +08:00
chenyu	36056e0760	update mlperf systems and copy 4.1 to 5.0 (#7004 )	2024-10-11 16:20:34 -04:00
Markiian Novosad	8831c691e2	Add slice parameter type checking to disallow Tensor usage for slices (#6967 ) * add support for single el tensors for slices * rm trailing spaces * cleanup long lines * remove tensor in slice support, add comprehensive err msg * cleanup getitem, add slice type check * Edit err message	2024-10-11 16:20:21 -04:00
Francis Lam	b0dd407cdd	ops_cuda: add optional dynamic smem parameter (#6956 ) * ops_cuda: add optional dynamic smem parameter This is required to enable larger than 48kb shared memory usage on a per-kernel basis. * move setting max dynamic smem size to init	2024-10-11 21:51:06 +03:00
chenyu	0e42662f2a	log seed at the right place for bert (#7000 )	2024-10-11 10:39:40 -04:00
nimlgen	5496a36536	update red mlperf bert readme (#6969 )	2024-10-11 13:08:06 +03:00
nimlgen	feb0bcb58b	qcom bench bind to perf cluster (#6996 )	2024-10-11 12:21:52 +03:00

1 2 3 4 5 ...

6388 Commits All Branches Search

6388 Commits

All Branches