tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	1bf4aef0f5	fix image dtype cmp (#2089 ) * fix image dtype cmp * print that with debug 3	2023-10-16 17:52:38 -07:00
George Hotz	e4846771b2	Revert "limit metal buffers and revert the 207 fix (try 2) (#2088 )" This reverts commit `5e24dc5a95`.	2023-10-16 17:50:11 -07:00
George Hotz	d0aaf7d83b	Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 )"" This reverts commit f22a7cf6561fd3843b7e0c1d77a72a39a127bcd8.	2023-10-16 17:47:00 -07:00
George Hotz	5e24dc5a95	limit metal buffers and revert the 207 fix (try 2) (#2088 ) * limit metal buffers * look at the base, not the srcs * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`. * add a test for that	2023-10-16 14:52:16 -07:00
George Hotz	e8fcd2f3db	Revert "limit metal buffers and revert the 207 fix (#2087 )" This reverts commit `2fb10f6a19`.	2023-10-16 14:32:22 -07:00
George Hotz	2fb10f6a19	limit metal buffers and revert the 207 fix (#2087 ) * limit metal buffers * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`.	2023-10-16 14:26:32 -07:00
George Hotz	a7b18ac325	try beam search on device (#2085 ) * try beam search on device * fix beam with nolocals * ops too --------- Co-authored-by: Comma Device <device@comma.ai>	2023-10-16 12:52:42 -07:00
George Hotz	c36d306606	KOPT is over, BEAM is upstream (#2071 ) * create cache for q learning * make linter happy * global beam * where it belongs * bugfix * ditch the kopt, use the beam * faster lin and DEBUG=2 okay * remove kopt, move search to features	2023-10-16 09:46:03 -07:00
nimlgen	e4660b024f	mute hip warnings (#2081 )	2023-10-16 07:09:10 -07:00
George Hotz	5472a14544	openpilot compile2 (#1977 ) * start compile2 * tweak * why are there two more kernels? * minor cleanups * don't break onnx tests * add __metadata__ support to safetensors * no early realize in onnx * cleanups * bugfix * clean up image type, add optimize * opt to match old * try that * opt work * run compile2 * optimizer * prt more * prerealize * imp * NOLOCALS works * no locals means no locals * support fractional globals * all locals welcome * int that * cleanups * show gemv regression * clean up diff * use idx for the cond * nolocals --------- Co-authored-by: Comma Device <device@comma.ai>	2023-10-15 20:39:46 -07:00
George Hotz	566660675c	bugfix for cuda warning (#2078 )	2023-10-15 18:35:35 -07:00
Ahmed Harmouche	0d3410d93f	Stable diffusion: Make guidance modifiable (#2077 )	2023-10-15 14:36:43 -07:00
Umut Zengin	776605f2fc	O(1) VALIDHACKS (#2072 ) * first refactoring * O(1) validhacks * O(1) validhacks * Some cleaning * mypy * flake8 * Trim trim * flake8 * clean * less chaotic * less chaotic * flake8 * Symbolic, SumNode include mulnode for gcd * fix tests * smal optim * revert * clean * clean * flake8 * small fix * Add symbolic test	2023-10-15 11:26:41 -07:00
George Hotz	30933d5bd0	if support (#2076 ) * if support * bugfix * fix wgsl if * more correct wgsl fix	2023-10-15 07:17:37 -07:00
nimlgen	cb9309bee6	remove temp files (#2075 )	2023-10-15 06:45:36 -07:00
George Hotz	49bcfec383	0s in the action space (#2070 ) * 0s in the action space * simpler * skip duplicate actions	2023-10-14 11:22:48 -07:00
George Hotz	4124cf1df5	cleanup tensor cores, expose exclude local upcast (#2064 ) * expose exclude_local_upcast * convert apply tensor cores to ops * update comment * put LOCAL back to what it was, BEAM is better than way	2023-10-14 09:21:03 -07:00
David Hou	6e4a12ab68	HIPProgram __del__ (#2058 ) * HIP: only load modules when run, add __del__ * only add del	2023-10-13 17:15:46 -07:00
mmmkkaaayy	91168a28c4	whisper: make file transcription work, add basic CI test (#2042 )	2023-10-13 17:13:35 -07:00
George Hotz	924ecc4d6a	Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 ) This reverts commit `63869c62fc`.	2023-10-13 12:01:55 -07:00
Amrit Sahu	63869c62fc	openpilot kernel fix from 209 to 207 (#2006 ) * Fix openpilot kernel from 209 to 206 1. Use push_movement_ops conditions in _movement_op. Don't push PAD or check if the ops are safe to be pushed with PAD 2. Don't push if all the op.buffers are realized * change ALLOWED_KERNEL_COUNT to 206 for openpilot * don't push through sourceless buffers * change the tests to adjust kernel counts for new behaviour * restore pushing of movement ops through childless buffer * don't push EXPAND, causes OOM * allow push of intermediate movement ops * adding new test behaviour * modifying external_test_opt for new behaviour * restore old tests * Reenable push of EXPAND and introduce new tests I was wrong intially thinking EXPAND can cause OOM and hence I had disabled it. Since it is 0 stride and doesn't allocate memory its cool * Don't push EXPAND above LoadOps LB. This is causing OOM * Push should be decided on movement root of bufs To check if ast.op.buffers is sourceless/ realized go the the movement root and then decide if pushing should be done or not * refactor for readability * use .base instead * don't push expand, bad memory/compute consumption * restrict push of reshape, seeing improvement * push reshape if unary without further check * disable PAD solves convnext kernel count increase * reenable test_cache_binaryop_transpose * small nit	2023-10-13 11:59:15 -07:00
George Hotz	90c777d815	remove apply_auto_opt (#2063 )	2023-10-13 07:44:14 -07:00
nimlgen	bd42fa0b73	kernel cache (#2035 ) * init compiled cache * clang not compile to stdout * use kwrags in compile * remove some useless lines * slimmer * fix * tabs * retry * remove decorator * no race in hip * smaller hip * unused import * unused pathlib * path to str * add test * fix linter * less lines? * decorator is back * update tests * no hip version * better comments * a bit better test * linter * work wo decorator * linter happy * simpler return type * more tests * better comment * readable * readable * readable * compile returns bytes * no ununsed imports * readable	2023-10-13 06:32:01 -07:00
George Hotz	6f1810af2d	with unroll, the action space goes from 161 -> 127 (#2060 ) * with unroll, the action space goes from 161 -> 127 * more reliable instrumentation * beam search is so op * beam bugfix	2023-10-12 20:52:23 -07:00
Umut Zengin	6b7ac5c431	ModNode __mod__ rule (#2039 ) * Implement mod rule * mypy * feat: New test added	2023-10-12 11:30:10 -07:00
Yixiang Gao	3187962476	CIFAR HALF mode (#2041 ) * load weights in fp16 * add dtype option in nn * fix test * no need for dtype in nn * add option to load weights in FP16, but NaN * change loss scaler * cast to float32 for norm layer * add a todo for the forward pass padding * fix transform	2023-10-12 10:19:51 -07:00
George Hotz	c5edb3c374	train value net, improve API, add BCE (#2047 ) * api cleanups, BCE losses * valuenet * fixup examples * learning okay * add valuenet runner * net improvements * net improvements * 40% win rate	2023-10-12 07:56:38 -07:00
George Hotz	0ba629c7b9	add world dataset (#2045 )	2023-10-11 15:54:30 -07:00
George Hotz	0c3b6f13a8	Latest opt (#2044 ) * split out actions * rl algorithm	2023-10-11 15:46:14 -07:00
geohotstan	8d6cecb25c	Torch eq fix (#1562 ) * init * Revert "init" This reverts commit 682bf2073a8b4eca111596c67cf6ebd79f59e585. * kids dont do drugs * one way to fix * resolve merge conflict * no more or * clean up	2023-10-11 12:57:11 -07:00
George Hotz	41bfeb2c1e	start work on auto opt (#2034 ) * start work on auto opt * lin failure * not beating hcopt * greedy * timing is fast * codegen.search * greedy search in handcode_opt * track running gflops * clean up those files * no failure	2023-10-11 12:54:53 -07:00
chenyu	1c980517c5	s/var_vals_from_ast/vars_from_ast (#2038 )	2023-10-10 20:21:55 -07:00
Francis Lam	81c7d750db	test: fix test_linearizer.test_tensor_core test (#2036 ) must use apply_tensor_core instead of hand_coded_optimizations	2023-10-10 14:48:28 -07:00
chenyu	e2b83f1b42	Variable.bind newer (#2017 ) * Variable.bind attempt 2 * ShapeTracker.unbind * fix llama * fix types * test case * View.vars cleanup * include mask in symbolic source * mask can be sint * st.unbind in bufferops * assert ast contain free Variable only * cleanup * conservative unbinding reduce op arg * move reduceop unbind * fix llama JIT arg behavior	2023-10-10 10:03:01 -07:00
qazal	71d93ffd79	Refactor GPU and Metal langauges in their own separate renderers (#2033 ) * Refactor GPU and Metal langauges in their own separate renderers * remove CStyleLanguage imports * move renderers too	2023-10-10 07:46:41 -07:00
George Hotz	f139060103	Rewrite hand coded opt with action space (#2030 ) * tests passing * hand coded opt with new abstractions * simpler opts * split out tensor cores	2023-10-10 07:38:38 -07:00
Ahmed Harmouche	e27fedfc7b	Fix stable diffusion output error on WebGPU (#2032 ) * Fix stable diffusion on WebGPU * Remove hack, numpy cast only on webgpu * No-copy numpy cast	2023-10-10 06:40:51 -07:00
qazal	e40f141203	Refactor and add more unit tests for disktensors (#2022 ) * testing with the test_ops pattern * add assign test * flake8 complaining about single line fn * slice 2d and minor cleanup * make assign_slice a one-liner * we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn * back assign fn for np array * implement __setitem__ in tensor.py * dont re-slice the ret tesnsor * one liner assign * drop the permute test	2023-10-09 18:46:29 -07:00
chenyu	45f0891a8f	use "<" instead of "<=" in codegen for loop (#2027 )	2023-10-09 17:26:36 -07:00
chenyu	25555c836f	llama default to JIT only if device supports JIT (#2028 )	2023-10-09 17:26:02 -07:00
George Hotz	16ca8410f8	op logger + replay (#2021 ) * logops * fix dtype printing * needs inf * ops dataset * minor improvements * 12k kernels * opt can compile * graph flops	2023-10-08 15:10:18 -07:00
calledit	46f354b49f	Fix comment to describe code (#2023 )	2023-10-08 14:28:14 -07:00
qazal	0e2e041faf	CI for using tinygrad as an external pkg (#2019 ) * create workflow * unify with test.yml	2023-10-08 10:50:48 -07:00
George Hotz	8db92bd060	fix tvm gemm example	2023-10-08 05:57:41 -07:00
mmmkkaaayy	af6e2f31ca	whisper: cast model output token to int32 (#2013 ) Co-authored-by: mmmkkaaayy <mmmkkaaayy@users.noreply.github.com>	2023-10-08 05:56:22 -07:00
Luca Sciarpa	e93e240a6c	adapting test/external/external_osx_profiling.py to the new code base (#2002 ) * adapting external osx profiling * fixing dtype * fixing buffer size	2023-10-08 05:55:00 -07:00
wozeparrot	c4e8ea73bd	feat: add tinygrad.features to setup.py (#2016 )	2023-10-07 21:55:50 -07:00
Francis Lam	dece9958f8	wmma: clean up to make WMMA arg order consistent (#2014 ) also add cache defeat to extra/gemm/simple_matmul.py	2023-10-07 17:45:40 -07:00
George Hotz	cea4cbfc7a	move image+kopt to features (#2015 ) * move image+kopt to features * fix tests * debug prints (unrelated)	2023-10-07 15:41:08 -07:00
George Hotz	44ed94ef5c	use the device abstraction in handcode_resnet50_opt	2023-10-07 13:22:20 -07:00

... 4 5 6 7 8 ...

2882 Commits All Branches Search

2882 Commits

All Branches