tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	5e6265be6e	metal timing, fix speed test	2023-02-17 12:31:54 -08:00
George Hotz	121bd03cbd	metal globalcounters	2023-02-17 12:02:54 -08:00
George Hotz	20a03d5017	woah, don't sync torch if it's not torch	2023-02-12 07:48:56 -08:00
George Hotz	de71c13934	test speed v torch uses jit	2023-02-12 07:43:17 -08:00
George Hotz	b9f02671d3	oops, broke torch speed test	2023-02-10 16:13:53 -06:00
Jacky Lee	5c51ae8dbf	Show where tinygrad is faster in speed test vs torch (#549 ) * show where tinygrad is faster * don't change text color	2023-02-10 14:01:07 -06:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
George Hotz	2db272c7f7	Kernel Optimizer (#489 ) * kernel optimizer * 10x faster, but wrong. not good deal * move test -> extra * print x speedup * clcache * fix clcache + DEBUG * GFLOPS estimate * i==3	2023-01-29 17:15:00 -08:00
George Hotz	ebdec2b72f	fix optimizer	2023-01-29 00:23:06 -08:00
George Hotz	44e96c58b4	touch up pytorch speed tests	2023-01-25 18:11:26 -08:00
calledit	a0af1045bf	Some new tests (#440 ) * Make test run * Added new tests: sub pow constant_sub * Fix indentation * Added one to many lines * Fix indentation * Update test_cl_tiler.py * Delete test_cl_tiler.py	2023-01-25 15:40:19 -08:00
George Hotz	6d7658db12	delete opencl <celebration>	2023-01-24 14:18:35 -08:00
George Hotz	6fe9edf30f	torch cuda is very fast	2023-01-23 16:24:46 -08:00
George Hotz	a6de94b444	test partial sum	2023-01-22 21:28:40 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
cloud11665	4fb97b8de0	don't fail when termcolor is not installed (#436 )	2022-11-14 16:45:06 -08:00
George Hotz	5e07d4669d	the speedy chonker is going to replace the old chonker (#432 ) * bringing back reshape and permute * done with E701 * 4x4 works in generic way * max and sum not vectorizing... * special case single float * support comparing to MPS * improve matmul speed, consider generic principles * GlobalCounter * fix op tracking * faster * comment that out for now * err, it needs that * fix minor issues * fix global_mem	2022-11-11 18:34:24 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	9781b4c3af	rename test functions to helper_	2022-11-07 21:27:56 -08:00
George Hotz	9884be2ad5	ugh, that too	2022-11-07 21:21:35 -08:00
George Hotz	537a9eb414	fix termcolor import	2022-11-07 21:19:08 -08:00
George Hotz	2cc1d970c6	updates from the chonker branch	2022-11-07 21:12:08 -08:00
George Hotz	544cb0a069	oops, remove while(1)	2022-10-29 14:05:13 -07:00
George Hotz	fdb43fe553	gemm is 1.7 TFLOPS on a single M1 core	2022-10-29 13:42:33 -07:00
George Hotz	f885ceb695	test speed w/o bias	2022-10-28 11:22:15 -07:00
George Hotz	793edf8900	touchup	2022-10-10 16:13:34 -07:00
George Hotz	d54a45b50d	measure speed vs torch	2022-10-10 16:06:00 -07:00

1 2

77 Commits