tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	aea55eb196	found failing upcast	2023-01-30 16:12:56 -08:00
George Hotz	b67f997864	tests pass w/o float4	2023-01-30 15:40:49 -08:00
George Hotz	c6f570a2e6	improve progress bar	2023-01-30 14:50:28 -08:00
George Hotz	7118602c97	goat progress bar	2023-01-30 14:37:26 -08:00
George Hotz	cccfea4b25	factor out KOPT code	2023-01-30 13:13:55 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
AllentDan	7b6b1f32b1	[Fix] fix typo: test_mnist -> datasets (#492 ) * test_mnist -> datasets * fix mnist_gan	2023-01-29 21:30:47 -08:00
George Hotz	2db272c7f7	Kernel Optimizer (#489 ) * kernel optimizer * 10x faster, but wrong. not good deal * move test -> extra * print x speedup * clcache * fix clcache + DEBUG * GFLOPS estimate * i==3	2023-01-29 17:15:00 -08:00
George Hotz	bb0cdc2442	111.51x speedup for reduce	2023-01-29 03:06:00 -08:00
George Hotz	45c0aa6e2d	search with SHIFT, REDUCE	2023-01-29 02:42:20 -08:00
George Hotz	87879cf4b6	improve search more	2023-01-29 02:08:57 -08:00
George Hotz	f6bbd43cb8	improve search	2023-01-29 01:33:47 -08:00
George Hotz	ebdec2b72f	fix optimizer	2023-01-29 00:23:06 -08:00
George Hotz	a9cabce791	oops, broke mem estimates	2023-01-28 20:21:31 -08:00
George Hotz	a500e79bd1	don't OPTWG on OS X, it's way slower	2023-01-28 20:02:33 -08:00
George Hotz	b0df4d99a0	os x profiling: this ratio is exact i believe	2023-01-28 19:02:51 -08:00
George Hotz	ae810eb558	minor cleanups	2023-01-28 08:59:15 -08:00
George Hotz	6d5e1a8029	GEMM kernel search	2023-01-27 10:08:57 -08:00
Comma Device	f08e740957	factor out hand coded opt	2023-01-26 14:54:06 -06:00
George Hotz	5e8a36a18b	real op kernel	2023-01-26 09:51:32 -08:00
George Hotz	e0600f537a	op kernel in kernel search	2023-01-26 09:47:01 -08:00
George Hotz	aafc29484a	cleanups	2023-01-25 12:37:10 -08:00
George Hotz	919e943867	decent search	2023-01-25 12:20:53 -08:00
George Hotz	7f3da91f8b	kernel_search	2023-01-25 12:05:09 -08:00
George Hotz	e37424424f	first little attempt at search	2023-01-25 11:49:29 -08:00
Comma Device	9e2af0a972	too far with the OPTWG	2023-01-24 13:14:59 -06:00
Comma Device	3590848b93	a little more local workgroup options	2023-01-24 12:50:27 -06:00
Comma Device	4b74752c42	fix hotspots by improving the workgroup optimizer	2023-01-24 12:46:28 -06:00
George Hotz	fd760a390a	fix incremental time	2023-01-24 10:19:04 -08:00
George Hotz	a949de873b	reduce 2.0 (#469 ) * reduce 2.0 * works * hacks * DEBUG=3 for shapes * fix types * 0s weren't being folded * cleaner * last_reduce is no longer needed * comments and cleanup	2023-01-23 15:11:13 -08:00
George Hotz	f1196984e6	harmless to intertwine the math and the stores	2023-01-21 09:31:56 -08:00
George Hotz	708215d06b	Typing (#468 ) * we typing * types look good in theory * most tests pass * gpu tests pass * TEST_AST * delete comments * i must have written that bug so many times * bugfix * don't merge the small ones * add f to constants * commits from reduce * don't GCD the mod nodes * broken and a hack IMAGE=3 * group for reduce * fix linter + mypy * move out test ast * insource TENSOR_TYPE_TO_NP_TYPE * does this fix it? * move imports out	2023-01-21 09:09:22 -08:00
George Hotz	0881d504c1	move shapetracker (#466 ) * move shapetracker * shapetracker test * move ast * move a few things * fix print kernel * fix test * symbolic fixups	2023-01-19 09:56:31 -08:00
George Hotz	9245f4650a	indexer changes for master	2023-01-18 18:02:02 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	281b0db773	three from image	2023-01-12 12:26:58 -08:00
George Hotz	9ff6c532eb	Prereqs for IMAGE=1 (#461 ) * contig * move ast, debug prog * add Token * cleanup reduce * exec_ast	2023-01-11 20:18:42 -08:00
George Hotz	fff1f046b0	Simple version of the new GPU backend (#458 ) * newgpu * more to delete * hmm, tests pass with constant folding * fix lint/type * fix constant folding * comment and rerun tests * lazy touchups * fix graph_batchnorm test * smaller transformer to fix OOM * Revert "smaller transformer to fix OOM" This reverts commit a44ef8edc275a4b3c78ee711ba188e220b7a879f. * no func cache * introspect * touchups * CLASTKernel * ugh, it was lru_cache * codegen * spacing * old gpu still in opencl * typing fix	2023-01-10 19:16:02 -08:00
George Hotz	fad7cba590	move batchnorm to Tensor	2023-01-09 18:00:16 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	2cc1d970c6	updates from the chonker branch	2022-11-07 21:12:08 -08:00
George Hotz	d878065ece	Gemm (#416 ) * gemm * off by factor of 5 * 50 GFLOPS * works * 91 gflops * working at 50G * works * iy * 150 GFLOPS * 150 GFLOPS * N=2048 is still fast * threading soon * multithread * pinning * throttling is sad * Align matrices to cacheline width (#361) Co-authored-by: cloud <Cloud11665@gmail.com>	2022-11-06 10:07:28 -08:00
George Hotz	6a8fb53304	move ops.py into lazy.py (#402 ) * move ops.py into lazy.py * fix graph and linter * ugh, didn't add	2022-10-25 13:58:03 -07:00
George Hotz	8e22d5ee67	replace networkx with defaultdict	2022-10-20 19:36:43 -07:00
George Hotz	63f9c55156	really dumb bug	2022-10-20 17:07:47 -07:00
George Hotz	1bec4651b3	fix nonstatic weights	2022-10-20 17:04:14 -07:00
George Hotz	bb288e6938	safe_numpy and warning for broken matmul	2022-10-20 15:40:22 -07:00
George Hotz	50c95c7d9a	add assert to catch issue in attention	2022-10-20 15:13:00 -07:00
George Hotz	26c78ccf7d	remove useless buffer	2022-10-20 14:07:28 -07:00
George Hotz	a18c1f3178	zero out the inputs	2022-10-20 13:46:52 -07:00
George Hotz	ace8db29f8	ReduceSum	2022-10-20 12:48:14 -07:00
George Hotz	c400ee0beb	refactoring thneed (#400 ) * refactoring thneed * continue * minor update * looks like it's working * big refactor * confirm thneed got the right output * code is there but it's broken * works now * always OPTWG, input -> dat * fix type issue	2022-10-20 12:35:59 -07:00
YassineYousfi	ae0f9b17df	openpilot: new models and onnx ops (#401 ) * ngrl stuff * fngrl * fix typo in compile script * workflow dispatch * new models in tests * dont need to up this threshold Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>	2022-10-20 11:49:19 -07:00
George Hotz	ff11c4316b	move get_parameters to optim.py	2022-09-25 13:16:58 -04:00
Jacky Lee	2c01a66265	Reshape dataset from fetch_mnist (#390 )	2022-09-24 21:16:29 -04:00
George Hotz	271446e3eb	set requires_grad to None (#387 ) * set requires_grad to None * some things need gradients * hmm, why was get_parameters filtering	2022-09-21 11:16:02 -04:00
YassineYousfi	2f0f91ba3d	support float16 onnx weights (#384 )	2022-09-15 09:12:18 -04:00
YassineYousfi	1a7bdc51f8	support more onnx ops (#376 ) * broadcast from right to left * add another broadcasted add test * more onnx ops * use float32 range in clip	2022-09-07 15:15:24 -07:00
George Hotz	0516359af8	fix stupid OPENCL=1 OOM	2022-09-06 14:29:23 -07:00
George Hotz	4dadd95e3c	fix tests hopefully, more stable diffusion	2022-09-03 10:38:31 -07:00
George Hotz	c01a8c5c2d	stable diffusion start	2022-09-03 10:08:42 -07:00
George Hotz	a3fc64a585	fix batchnorm folding in openpilot compile	2022-08-31 13:04:49 -07:00
George Hotz	dc7af8c3ac	thneed run float32	2022-08-28 11:03:35 -07:00
George Hotz	b132de677d	tinygrad.nn (#367 ) * tinygrad.nn * flake8 * working on pylint * more pylint * more pylint * pylint passes * networkx * mypy can't infer that type * junk	2022-08-18 07:41:00 -07:00
George Hotz	f76d41812b	prune graph	2022-07-17 15:38:43 -07:00
George Hotz	eda6f071b2	default opt level 2	2022-07-17 14:54:40 -07:00
George Hotz	73b0471b25	join expands	2022-07-17 13:42:05 -07:00
George Hotz	d04b274cd2	noop removal can replace with reshape	2022-07-16 08:32:42 -07:00
George Hotz	2720ef49ca	extra and test and tuple	2022-07-07 10:01:33 -07:00
George Hotz	81b73f97a3	Optiimzation (#355 ) * constant folding into kernels * that opt worth it? * fix mypy * ast one kernel * save 2 lines in conv kernel * debug print kernel count * cl debugging * early realize inputs * refactor Device	2022-07-04 08:58:57 -07:00
George Hotz	7276f8d6bf	improve constant folding, detach before moving tensor	2022-07-02 15:29:40 -07:00
George Hotz	8cf1aed0f4	don't track_running_stats, parameters must require_grad	2022-07-02 14:38:45 -07:00
George Hotz	49c954b389	comments	2022-06-26 17:20:25 -07:00
George Hotz	83d50e2687	move to extra.onnx	2022-06-21 19:43:44 -07:00
George Hotz	9b27ba650b	load new torch files	2022-06-07 10:06:48 -07:00
George Hotz	233c71a7ba	support requires_grad	2022-06-06 07:47:31 -07:00
George Hotz	d8d19ed468	wikimedia wasn't returning 200	2022-01-15 19:09:29 -08:00
George Hotz	e28cdfb0cf	clean up resnet	2021-11-30 16:14:54 -05:00
George Hotz	58ed46963e	fix broadcastdot	2021-11-29 18:54:57 -05:00
George Hotz	dca076dbf1	remove dumb nn ops	2021-11-29 18:05:31 -05:00
George Hotz	30eb3afbe1	add bias term to transformer	2021-11-29 12:45:27 -05:00
George Hotz	e2a8961a18	less lines, fix bug	2021-11-17 12:52:17 -08:00
George Hotz	ba28761894	move yolo into examples/yolo	2021-10-30 19:46:00 -07:00
George Hotz	63f50cff45	move back again	2021-10-30 16:13:29 -07:00
Evan Mays	285621aeda	Cherry backprop for conv2d (#281 ) * quick math: 0 + x = x. * gradient w.r.t. x using cherry for conv * gradient w.r.t. w for conv on cherry but doing vector dot products * small optimization * [cherry] optimize conv backpass for large channel count * get rid of numpy einsum	2021-10-30 16:12:19 -07:00
George Hotz	3d646272d6	move back	2021-10-30 16:12:12 -07:00
George Hotz	ac8afd24fa	refactor accel	2021-10-30 16:10:59 -07:00
Guglielmo Camporese	2b7589db64	Added ResNet-{18, 34, 50, 101, 152} (#271 ) * added resnets * fix minor * fix minor * resnet in models * added resnet test * added resnet train test * added linear, conv2d nn tests * fix minor in extra/training * resnet in models * fix minor * fix tolerance for linear in nn test * fix eval, this causes cpu and gpu UT failing * revert transformer test * fix minor for CPU test * improved model get_params for sequential layer * fix minor for params counting * commented broken ops tests * improved train for resnet	2021-06-21 09:37:24 -07:00
George Hotz	89798d2f43	some flags	2021-06-19 11:46:31 -07:00
George Hotz	d81eae8288	debug cherry crash	2021-06-19 11:41:20 -07:00
George Hotz	d3f169b267	move good models to models, add a training step test	2021-06-19 11:24:15 -07:00
George Hotz	b48d4bad2e	clean up print spam	2021-06-19 10:31:04 -07:00
George Hotz	027535d0b5	microcoded matmul	2021-06-17 21:03:08 -07:00
George Hotz	026e2ae6a7	three registers and a zero command	2021-06-17 17:09:18 -07:00
George Hotz	2e71ae33f6	max op works	2021-06-17 17:01:21 -07:00
George Hotz	9e12c1bbba	cherry binop	2021-06-17 16:50:40 -07:00
George Hotz	fcdabea880	training mnist with cherry ops	2021-06-17 16:45:35 -07:00
George Hotz	2affd226b3	speed up sum	2021-06-17 16:38:34 -07:00
George Hotz	e8eb7d1b7e	max op	2021-06-17 16:20:56 -07:00
George Hotz	c1d469d440	sum op	2021-06-17 16:19:35 -07:00
George Hotz	b1000d866e	readme, plus reduce ops	2021-06-16 11:21:06 -07:00
George Hotz	ff3fdc58e5	risk -> cherry	2021-06-16 09:59:48 -07:00
George Hotz	2f91c012eb	build note	2021-06-15 22:41:41 -07:00
George Hotz	4850d6eb43	update todo	2021-06-15 10:22:39 -07:00
George Hotz	4e1edb3692	have tinygrad log the loads	2021-06-14 18:35:14 -07:00
George Hotz	93f2e9769d	little note	2021-06-14 15:49:41 -07:00
George Hotz	a89d12d735	wow, way faster	2021-06-10 17:11:39 -07:00
George Hotz	10b1306525	binops	2021-06-10 16:52:37 -07:00
George Hotz	4535d39baa	comments and pow	2021-06-10 09:03:40 -07:00
George Hotz	2075fdeb4f	FPGA Based Accelerator for Tinygrad (#258 ) * ops_risk * risk sim * guessing is for winners * minor * better * matmal with risk * conv doesn't work * closer * conv2d works * ops_risk * opt2 works * opt1 may not be possible * opt1 is a mulacc * arty * attosoc example building on mac * minor * riscv assembler * gucci gang * we got C code * not a scam * hello * make risk mergeable into master * unop support	2021-06-07 17:45:09 -07:00
Josh Smith	ad756f6112	minor optimizations & cleaning (#257 ) * use isinstance, some optimizations & whitespace removal * revert whitespace changes * revert more whitespace * some more cleanup * revert fstring (not a fan of the {{}}) * fix typo * fix typo	2021-06-02 09:57:15 -07:00
George Hotz	b80cacb416	fix GPU efficientnet example	2021-05-26 17:29:35 -07:00
20kdc	2653d33292	vgg7 (image upscaling) implementation - not the best, but it works (#255 ) * vgg7 implementation - not the best, but it works * VGG7 implementation: Spread nansbane to deter NaNs, maybe improved training experience * VGG7 implementation: Fix training, for real this time Results actually attempt to approximate the input * VGG7 implementation: Sample probability management	2021-05-12 23:48:51 -07:00
George Hotz	ac229ea750	remove print	2021-01-02 12:53:30 -08:00
George Hotz	895d142503	start trying to load yolo v5	2021-01-02 12:51:55 -08:00
Marcel Bischoff	42b4761025	transformer >99.98% test accuracy in ~30s (#230 ) * transformer * BS might divide len(Y_test) * outoput when accuracy is high * more readeable * fixed loss in serious_mnist for new API	2021-01-02 07:45:09 -08:00
Liam	ebd72ff437	Test split (#231 ) * Split tests Split tests into "Test CPU" and "Test GPU". Add test flag "TEST_DEVICES" which is a comma separated list of devices: CPU,GPU,ANE * Run tests based on provided TEST_DEVICES flag By default will run all "CPU,GPU,ANE" * fix bad quote * Revert changes and use GPU=1 This is done through setting the default Tensor Device to Device.CPU of GPU=1 is set. Run GPU tests: GPU=1 pytest -s -v	2021-01-01 09:19:03 -05:00
George Hotz	f9170505b3	if you like your transformers twice as slow, use the GPU	2020-12-29 17:14:23 -05:00
George Hotz	3f8e137b6f	extra/transformer	2020-12-29 14:14:00 -05:00
Marcel Bischoff	dc8fa7999c	Transpose on GPU (#221 ) * 2serious * load/save * fixing GPU * added DEBUG * needs BatchNorm or doesn't learn anything * old file not needed * added conv biases * added extra/training.py and checkpoint * assert in test only * save * padding * num_classes * checkpoint * checkpoints for padding * training was broken * merge * rotation augmentation * more aug * needs testing * streamline augment, augment is fast thus bicubic * tidying up * transformer eval * axis=-1 * transpose * test for permutation using torch.movedims * another test * line	2020-12-29 10:40:11 -05:00
George Hotz	bcb3ceeca3	set training in functions	2020-12-28 22:45:46 -05:00
Marcel Bischoff	ffff98db78	Evaluation in Transformers (#218 ) * 2serious * load/save * fixing GPU * added DEBUG * needs BatchNorm or doesn't learn anything * old file not needed * added conv biases * added extra/training.py and checkpoint * assert in test only * save * padding * num_classes * checkpoint * checkpoints for padding * training was broken * merge * rotation augmentation * more aug * needs testing * streamline augment, augment is fast thus bicubic * tidying up * transformer eval	2020-12-28 09:24:51 -05:00
George Hotz	d864e1c71a	transformer is training	2020-12-27 18:46:32 -05:00
George Hotz	a361ef6861	fixup training loop	2020-12-27 18:35:56 -05:00
Nicklas Boman	06f359baa3	issue-193 - Move torch loader out of efficientnet code (#213 )	2020-12-22 00:19:16 -05:00
iainwo	56d44637f3	fixed pylint, formatted python files iwth cblack on localhost (#204 ) * fixed pylint, formatted python files iwth cblack on localhost * Revert "fixed pylint, formatted python files iwth cblack on localhost" This reverts commit 07e2b88466fa53399ad78d962ffb2ad55bc45344. * dedented 4-spaces added linter Co-authored-by: Iain Wong <iainwong@outlook.com>	2020-12-17 14:37:31 -08:00
Liam	bcf1518309	All devices are equal! (#196 ) * Update all devices to be tested ANE, CPU and OCL all now support all tests. However tests are not currently passing on GPU and I cannot test on CPU. Failing GPU test are not an issue caused by this update. Tests have not been passing due to a missing "six" required installation. OpenCL Tests have not been run since commit: `1a1c63a08b` devices have 3 types and are handle by a new DeviceTypes enum. (The goal is to revert to Tensor.<type>, but this current setup allows for keyword argument defaults: `device=DeviceType.CPU`) All references to Tensor.GPU/CPU/ANE as been converted to the corresponding `DeviceTypes` enum. Refactor of the conversion code to allow for any device to any device conversion. * Add six dependency in requirements.txt * Resolve failure to run tests Move six into gpu required installs. Remove six from standard installation. * Remove repeated data conversion * Refactor method names Also reduce code with .to and .to_ * Dynamic device handlers * Refactor DeviceTypes -> Device * Add mem copy profiling back * test_backward_pass_diamond_model passing * Resolve Sum issue on GPU * Revert batchnorm2d tests * Update README with upadated API * ANE testing with * Last minute line gains	2020-12-15 23:44:08 -08:00
Marcel Bischoff	da72a0eed4	Big MNIST model with PIL augmentation and load/save (#160 ) * 2serious * load/save * fixing GPU * added DEBUG * needs BatchNorm or doesn't learn anything * old file not needed * added conv biases * added extra/training.py and checkpoint * assert in test only * save * padding * num_classes * checkpoint * checkpoints for padding * training was broken * merge * rotation augmentation * more aug * needs testing * streamline augment, augment is fast thus bicubic * tidying up	2020-12-13 20:45:55 -08:00
George Hotz	07ece2105e	actually move it	2020-12-12 15:26:58 -08:00
George Hotz	1d10559d1d	tinygrad.utils -> extra.utils	2020-12-12 15:26:07 -08:00
George Hotz	00312b8ad1	batchnorm work	2020-12-06 14:40:07 -08:00
George Hotz	da514c2918	fix enet init	2020-12-06 13:52:07 -08:00
George Hotz	521098cc2f	se optional, track time better	2020-12-06 12:29:42 -08:00
George Hotz	609d11e699	trainer works with CIFAR	2020-12-06 12:20:14 -08:00
George Hotz	03994e0011	load torch files without torch	2020-11-21 13:43:53 -08:00
George Hotz	2ffb8de1ea	move efficientnet to extra	2020-11-16 08:08:07 -08:00
George Hotz	13d34373d1	move gradcheck to extra, clean up unbroadcast	2020-11-16 08:03:31 -08:00

... 6 7 8 9 10

488 Commits