tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	cd534dee11	cstyle changes that don't pass process replay (#6734 ) * cstyle changes that don't pass process replay * add constant folder back there * cleanups * const * fix some tests * bfloat16 too * complete set of types * that cast shouldn't be needed * that was a questionable test	2024-09-25 17:33:34 +08:00
George Hotz	232edcfd4f	cast bool for type verify [run_process_replay] (#6742 )	2024-09-25 17:12:16 +08:00
George Hotz	cb22ef379a	truncate consts early (#6741 ) * truncate consts early * ptx still fails * Update dtype.py	2024-09-25 16:49:51 +08:00
nimlgen	e31552e2e0	qcom reinit queue on exec (#6728 ) * qcom setup on exec as gpu=1 * linter * gpulike * offsets	2024-09-25 16:08:50 +08:00
George Hotz	882339f729	remove parens from neg (#6738 )	2024-09-25 15:38:20 +08:00
qazal	5ad2f95d01	process replay diff stats (#6736 ) * process replay diff stats * fix tuples	2024-09-25 15:19:56 +08:00
nimlgen	56979aa3ed	qcom ioctl log levels (#6735 )	2024-09-25 14:59:27 +08:00
chenyu	66af8bb54c	use UOp.replace and UOp.define_var in validhack (#6730 ) easier to see the diff in replacement [run_process_replay]	2024-09-25 02:51:34 -04:00
chenyu	ff25bfb1b0	conv backward tests in test_simplify_valid_idx (#6727 ) the backward idx is pretty ugly now	2024-09-25 02:51:07 -04:00
qazal	6c69fec1ef	viz more info for rewrite location (#6729 )	2024-09-25 14:49:40 +08:00
George Hotz	39f78619ff	cstyle replay [run_process_replay] (#6731 ) * real minimum cstyle change * make it match * bring back DEFINE_GLOBAL store marking writable * bump line count to 9800 * closer * precompute don't render * cast/bitcast too * smem_align * vectorize * more pr match * remove that test * less PR diff * cstyle changes that [run_process_replay]	2024-09-25 14:26:05 +08:00
nimlgen	e1caa24a92	qcom fix binded queue might be overwritten (#6712 )	2024-09-25 12:45:23 +08:00
George Hotz	dd575da7ee	real minimum cstyle change (#6709 ) * real minimum cstyle change * make it match * bring back DEFINE_GLOBAL store marking writable * bump line count to 9800 * closer * precompute don't render * cast/bitcast too * smem_align * vectorize * more pr match * remove that test * less PR diff	2024-09-25 12:40:46 +08:00
chenyu	e6a1b5aa8f	more test_simplify_valid_idx cleanup (#6726 ) moved UOps.VECTORIZE of idx into the helper	2024-09-24 23:47:42 -04:00
chenyu	14524eeddc	test_image_valid.py -> test_simplify_valid_idx.py (#6724 ) restructure the tests, will use the same file for non-image tests	2024-09-24 23:32:27 -04:00
qazal	e0d8685c99	test_masked_upcast_wino check device buf_max (#6723 )	2024-09-25 11:26:53 +08:00
George Hotz	f45d178a55	hotfix: support JIT_BATCH_SIZE=0, make that the default	2024-09-25 10:36:04 +08:00
George Hotz	52e7f1c108	add new model CI	2024-09-25 10:23:06 +08:00
ttomsa	76bd4c7d5f	advanced setitem (#6262 ) * advanced setitem draft * add setitem tests * fix for tests * small change * handle repeated indices with test * fix v broadcasting to mask * clean up a bit * open more tests * clean up, fixes issue with scalar tensor index * fix * fix index_put_ and linter * add type annotation * done * remove non contiguous hack * woops linter * name fix * add back type notation * more type notation * final * linter * check lazydata not shared * no numpy * no numpy * rename * index benchmark * linter * no cloning time * rm benchmark * new function * rm contiguous and cast early --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-24 22:14:59 -04:00
qazal	3bf25aae78	start work on global buffer count limit [run_process_replay] (#6722 ) * add a bufs_max option * simple spec	2024-09-25 09:51:56 +08:00
George Hotz	b0ffe2452b	bump line count to 9800	2024-09-25 09:15:30 +08:00
chenyu	5c240c34aa	split validhack into simplify idx and drop valids (#6719 ) * split validhack into simplify idx and drop valids will be using the simplify idx for non-image buffer [run_process_replay] * shorter	2024-09-24 09:40:27 -04:00
qazal	cefc3e9382	make all schedules immutable [run_process_replay] (#6718 ) * compute inputs and outputs in LBScheduleItem [run_process_replay] * simpler metadata, delete __hash__ * no dynamic field * test_diff_schedule	2024-09-24 21:08:16 +08:00
qazal	29330014ab	give FUZZ_SCHEDULE views a base (#6717 ) * memoryview to bytes * give FUZZ_SCHEDULE views a base	2024-09-24 19:20:37 +08:00
nimlgen	f0019ad29c	bump ci test timeout for test_speed_exec_time (#6715 ) * bump ci test timeout for test_speed_exec_time * more	2024-09-24 18:44:09 +08:00
qazal	1c03fb69c9	viz dedup assert groupby ctx [run_process_replay] (#6714 )	2024-09-24 18:17:21 +08:00
chenyu	8d75326cb5	do not fold var with min==max (#6713 ) not really used, want it to keep as a var for valid simplification [run_process_replay]	2024-09-24 06:16:34 -04:00
chenyu	9e51879019	fix idx setup in image_valid test_openpilot_conv3 (#6710 ) * fix idx setup in image_valid test_openpilot_conv3 * corrected output and sad	2024-09-24 05:49:04 -04:00
qazal	ae3f3fec38	refactor DEFINE_GLOBAL inputs to list [run_process_replay] (#6711 )	2024-09-24 17:43:24 +08:00
wozeparrot	f932116e05	feat: small things from default_threefry (#6708 )	2024-09-24 17:00:47 +08:00
chenyu	f2700ac58a	construct a candidate set to attempt valid idx rewrite (#6706 ) preparation for the brute force attempt for some valids	2024-09-24 04:12:21 -04:00
wozeparrot	2be0b26a1f	rand only supports single device (#6682 )	2024-09-24 16:07:44 +08:00
nimlgen	75b7627db7	qcom do not recreate memoryviews on updates (#6701 )	2024-09-24 15:36:22 +08:00
chenyu	a6078c099f	simpler idx rewrite structure in simplify_valid_image_load (#6704 ) express valid into things to check when rewriting idx. it's the same for single clause or a simplex [run_process_replay]	2024-09-24 03:35:39 -04:00
nimlgen	d3ed50c769	fix typo in 'Too many resources requested for launch' (#6705 )	2024-09-24 15:33:01 +08:00
wozeparrot	ef7a74bfa0	feat: use /raid/downloads on tinybox (#6702 )	2024-09-24 15:26:31 +08:00
nimlgen	ca66b11e07	qcom fix disasm (#6703 )	2024-09-24 15:23:43 +08:00
nimlgen	a473bf4ba9	do not always update float dims (#6699 ) * do not always update float dims * linter * isinsatcen	2024-09-24 14:40:45 +08:00
qazal	048483ee0b	viz fold const nodes and UOp/float4 syntax highlight (#6695 ) * fold const nodes * show rewrite count * hotfix: cpp * more syntax highlight * custom language definitions * only cpp * small fixups for UPat * extend python * cleanups * rewrites helper * better message	2024-09-24 14:36:59 +08:00
chenyu	4bb1694f49	more tests about bounds of UOp divs (#6700 )	2024-09-24 00:41:43 -04:00
chenyu	79aef64d70	update tests in test_image_valid (#6698 )	2024-09-24 00:04:21 -04:00
Anurag Lamsal	568757e087	fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory (#6649 )	2024-09-24 11:20:44 +08:00
chenyu	4a2fa0b627	clean up apply OptOps.PADTO [run_process_replay] (#6694 )	2024-09-23 23:13:50 -04:00
chenyu	f703180356	hotfix missed cast in cstyle code_for_workitem (#6693 ) `NOLOCALS=1 python -c "from tinygrad import Tensor; Tensor.randn((5, 5)).realize()"` works on green box with this fix #6687	2024-09-23 22:18:18 -04:00
samm393	19c11792fd	Flux.1 (#6334 ) * initial commit * whitespace * get rid of torch import * indentation * less hardcoding * add flux.1-dev * jit * no double * t5 tidy up * validation image * reuse sdxl autoencoder * typing changes * empty lines * remove unneeded comments --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-24 10:08:04 +08:00
chenyu	31b9c74c77	tiny import cleanup and fix typo (#6692 )	2024-09-23 21:48:23 -04:00
qazal	02c0c09fb9	VIZ syntax highlighting and new colors (#6686 ) * VIZ syntax highlighting * more work	2024-09-24 09:41:07 +08:00
ignaciosica	0ffbd75af8	Refactor TC [run_process_replay] (#6456 ) * unify _apply_tc_opt * refactor tc pt2 * hotfix: remove blank line * refactor upcast_axes * simplify check before using tensor_cores * rename upcast_axes * fix amx and remove counting hack * AMX cleanup * hotfix: bug * skip hand-coded TC opts if AMX to also skip if emulating * hotfix: AMX bug * hotfix: AMX tests * minor format change * hotfix: minor var name change * hotfix: minor refactor * hotfix: hand-coded tc bug * hotfix: simple change * fix comment * hotfix: refactor attempt to local N * hotfix: AMD TC spacing * refactor tensor core options in kernel.py to include opt order * hotfix: add comments to TensorCore dataclass * hotfix: improve comment on TC dataclas * hotfix: refactor opt_seq loop * hotfix: add comments in hand-coded TC opts * hotfix: upcast_axes comment * hotfix: remove unroll from opt_seq * hotfix: bug + remove unroll from opt_seq * hotfix: rename opt_seq into opts_seq --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-24 09:05:29 +08:00
George Hotz	b9e6d42a1f	Revert "gated native math in OpenCL (#6683 )" (#6691 ) This reverts commit `2fe3eeed17`.	2024-09-24 08:48:10 +08:00
Harald Schäfer	382938ab41	Add command to show default backend in README (#6688 ) * Update README.md * Update README.md * Update README.md	2024-09-24 08:42:18 +08:00

1 2 3 4 5 ...

6154 Commits All Branches Search

6154 Commits

All Branches