tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	c71a8ef222	remove unused Div op	2020-12-06 13:02:12 -08:00
George Hotz	20f95de408	less lines	2020-12-06 12:56:03 -08:00
George Hotz	629efb391f	lose a few lines	2020-12-06 12:42:41 -08:00
George Hotz	521098cc2f	se optional, track time better	2020-12-06 12:29:42 -08:00
George Hotz	609d11e699	trainer works with CIFAR	2020-12-06 12:20:14 -08:00
George Hotz	80a9c777ba	requires grad, optim in train enet	2020-12-06 11:10:30 -08:00
George Hotz	c66c27d22e	get parameters	2020-12-06 10:45:04 -08:00
George Hotz	51daaa43d4	fix memory leaks, add gc test	2020-12-06 10:34:40 -08:00
George Hotz	1717daa859	reshape doesn't copy anymore	2020-12-06 09:51:09 -08:00
George Hotz	62ee47fef8	add GPUBuffer class	2020-12-06 09:45:13 -08:00
George Hotz	17659f7dd7	gpu speedup, tests work on M1	2020-12-06 09:05:49 -08:00
George Hotz	b8deb36e56	train BS=16 for 32 steps	2020-12-04 10:00:32 -08:00
George Hotz	ad1b225722	oops, i broke fill	2020-12-04 09:53:38 -08:00
George Hotz	cb79c9838e	make the GPU 25% faster by not recreating kernels	2020-12-04 09:51:00 -08:00
George Hotz	df81bf5985	it's the default, but make it explicit	2020-12-04 09:43:41 -08:00
George Hotz	888689b57b	proprotip	2020-12-04 09:24:46 -08:00
George Hotz	2862b42bac	install from github	2020-12-04 09:06:25 -08:00
George Hotz	1290e01e2c	all ops supported on GPU now	2020-12-03 10:43:11 -08:00
George Hotz	621a93b777	ane in readme	2020-12-03 10:40:31 -08:00
George Hotz	1dcaecacc4	Support for Apple Neural Engine (#130 ) * ane query is success * cite and build instructions * low level access, need to disable AMFI * coreml_ane works * coreml fun * more work * compiled example * progress * compiler works * model flow * TODOs in the readme * put some real weights in * we are learning objc * much progress i think * signed model still doesn't work * working example * there are float16 * clean up: part 1 * h11ane header, more cleanup * cleanup DeviceController creation * remove the stupid sleep * notes * start a hwx parser * no tabs * compare stuff * hmm, why don't inputs work * cache doesn't seem to fix it * hmm, the issue was the compiler * fix the compiler, guess i didn't put in weights * logging for compiler * uselessness in plist * remove hwx before compile, weights are converted to float16 * better compare * better compare * last line in comparE * opcodes from compiler * notes	2020-12-03 10:32:26 -08:00
baplou	c83cebccda	Made the readme more consistent (#136 )	2020-11-28 08:20:02 -06:00
Marcel Bischoff	541330c42a	Update README.md (#133 ) should we put `ipython3` otherwise the path doesn't work or we have to add the env, not sure what is nicer	2020-11-25 07:53:54 -08:00
Mufeed VH	0bbf66627c	Define `ProfileOp` class once (#131 ) * define `ProfileOp` class once * clean `ProfileOp` class * removed `else: pass`	2020-11-24 19:39:13 -08:00
George Hotz	03994e0011	load torch files without torch	2020-11-21 13:43:53 -08:00
Marcel Bischoff	26899869a2	Update tensor.py (#128 ) Otherwise `.cpu()` is broken if default is GPU	2020-11-21 09:16:03 -08:00
adamritter	f190ca446d	Detach (#123 ) * Detach * Torch.detach reuses the buffer in the * Fix test * wakey wakey GitHub Actions Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-19 19:03:42 -08:00
Colin Manko	8383ff40ad	fix pyopencl (#125 )	2020-11-19 19:03:04 -08:00
adamritter	5797e63d9b	Train efficientnet should respect NUM environment variable (#122 ) Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-16 20:02:31 -08:00
dustcollector12	ee99d016e9	tensor implementation for rmsprop and adam (#121 ) * tensor implementation for rmsprop and adam * test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu * number of steps reduced for adam from 1000 to 200	2020-11-16 15:07:49 -08:00
George Hotz	17bf90dbe4	unbroadcasting works on the GPU	2020-11-16 09:16:55 -08:00
George Hotz	17eab716b6	unbroadcast GPU template	2020-11-16 08:16:36 -08:00
George Hotz	2ffb8de1ea	move efficientnet to extra	2020-11-16 08:08:07 -08:00
George Hotz	13d34373d1	move gradcheck to extra, clean up unbroadcast	2020-11-16 08:03:31 -08:00
George Hotz	ed4c35e2e9	channels on the inside	2020-11-15 21:19:59 -08:00
adamritter	fb1df81c7d	Fix train_efficientnet (#120 ) Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-15 20:50:31 -08:00
George Hotz	1207fe4c7d	cleanup LogSoftmax	2020-11-15 20:49:57 -08:00
George Hotz	d1441de3a6	minor cleanups	2020-11-15 20:39:19 -08:00
George Hotz	37a210f868	touchups and lines	2020-11-15 20:26:52 -08:00
adamritter	5ea3d76dfb	Topological sort, zero_grads (#119 ) * Topological sort, zero_grads * Bug fix, add test * Add zero_grads * Put deepwalk function in backward * Move zero_grad to optim * Fix gradcheck hack Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-15 20:25:29 -08:00
George Hotz	a35425189d	binop fast path for no broadcast	2020-11-15 19:12:14 -08:00
Marcel Bischoff	c7b7f8ccc8	Backwards ops supporting broadcasting (#118 ) * streamlined numerical_jacobian * Got rid of the g loop in Conv2D.forward * ereased stupid line * nothing * no loops in Conv2D forward * Conv2D backprop improved * stupid things in examples * alternative to einsum * Conv2D backward einsum alternative * tidying up * tidied up * no ravel * got rid of print * Update efficientnet.py * Update efficientnet.py * Update efficientnet.py * only tensordot * 255.0 * whitespace * aspect ratio error in efficientnet * noprint * efficient net wrong strides * broadcasting for backward ops * Update ops.py * Update ops.py - was wrong * broadcast test for backward enabled * function adBC + not summing over already 1 axis * spacing Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>	2020-11-15 15:21:10 -08:00
adamritter	55d93017e4	Simplify more (#117 ) Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-14 06:15:31 -08:00
dustcollector12	28474949b8	refactoring of forward in reshape (#115 ) * refactoring of forward in reshape * test case for reshape added	2020-11-13 13:20:43 -08:00
dustcollector12	6f033ea30a	enable local images for efficientnet.py (#116 )	2020-11-13 07:00:12 -08:00
pb1729	420af82888	General broadcasting of binary operations (#114 ) * allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array * remove extra tabs Co-authored-by: phillip <phillip_bement@reedbement.com>	2020-11-12 22:27:48 -08:00
damianzim	2b1286eef6	Don't wrap np.int32 in a function, use an alias (#113 )	2020-11-12 19:32:19 -08:00
adamritter	08aa60d9d0	broadcasting 1s at the start, 1 kernel/4 divs version (#110 ) * Pad2d backward pass on GPU * Faster Pad2D GPU backward pass (no zeroing needed) * Fix out of bounds error * Don't save prg * Let compiler optimize division by 1 * More generic broadcasting (1s at the start) * Bug fix * Add comment * Try to fix flaky test with other method * Add mixed broadcast support * 1kernel * Separate broadcast tests Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-12 13:33:35 -08:00
NeuralLink	f773ef3996	⚡ tanh non first class op (#111 ) * ⚡ tanh non first class op * tanh test with 1e-6 tol Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>	2020-11-12 13:32:50 -08:00
Ryan Neph	608bdd4872	adds broadcasting test cases (#106 ) refs: #80, #90, #104, #105	2020-11-12 07:08:28 -08:00
adamritter	f1d21afe88	Somewhat more generic broadcasting (#105 ) * Somewhat more generic broadcasting * Add TODO * Set Torch to deterministic in test Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-11 20:33:00 -08:00

... 2 3 4 5 6 ...

484 Commits All Branches Search

484 Commits

All Branches