tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	46bbbcf7f0	model touchups	2021-11-30 11:13:34 -05:00
George Hotz	bd21304e3c	linear takes in weight and bias	2021-11-30 00:38:47 -05:00
George Hotz	de938c2d9d	vit is now tested	2021-11-30 00:23:06 -05:00
George Hotz	58ed46963e	fix broadcastdot	2021-11-29 18:54:57 -05:00
George Hotz	dca076dbf1	remove dumb nn ops	2021-11-29 18:05:31 -05:00
George Hotz	f909ab194f	gelu with broken test	2021-11-29 15:00:50 -05:00
George Hotz	c752033283	fix GPU OOM in test	2021-11-29 13:05:59 -05:00
George Hotz	99b6051467	add ff_dim to transformer	2021-11-29 12:40:52 -05:00
George Hotz	29dee59368	cat: forward only not required	2021-11-29 00:14:56 -05:00
George Hotz	3cdc77f526	add cat support	2021-11-28 23:21:49 -05:00
George Hotz	ce3d198bb7	less lines and fix default device	2021-11-27 11:18:49 -05:00
George Hotz	7ae14179d3	refactor ops	2021-11-27 11:12:23 -05:00
George Hotz	c162e748f5	fix float64 warning on training	2021-10-30 20:07:31 -07:00
George Hotz	b0f14b4af8	move datasets into datasets	2021-10-30 19:55:50 -07:00
George Hotz	7472a7ebe2	not forcing 3.9 for a stupid type	2021-10-30 16:52:40 -07:00
George Hotz	fc6597a6d9	only resnet18, it's too slow otherwise	2021-10-30 16:48:39 -07:00
Evan Mays	285621aeda	Cherry backprop for conv2d (#281 ) * quick math: 0 + x = x. * gradient w.r.t. x using cherry for conv * gradient w.r.t. w for conv on cherry but doing vector dot products * small optimization * [cherry] optimize conv backpass for large channel count * get rid of numpy einsum	2021-10-30 16:12:19 -07:00
Sebastian Kreft	8113eec4cf	feat: add efficientnet test (#285 ) Simple test using the Chicken example from https://upload.wikimedia.org/wikipedia/commons/4/41/Chicken.jpg and the image preprocessing from example/efficientnet.py Note that EfficientNet loads the weights from the internet so running the tests may be slow the first time. We could speed up the tests by caching the /tmp folder. Fixes #234	2021-10-30 15:53:51 -07:00
Guglielmo Camporese	2b7589db64	Added ResNet-{18, 34, 50, 101, 152} (#271 ) * added resnets * fix minor * fix minor * resnet in models * added resnet test * added resnet train test * added linear, conv2d nn tests * fix minor in extra/training * resnet in models * fix minor * fix tolerance for linear in nn test * fix eval, this causes cpu and gpu UT failing * revert transformer test * fix minor for CPU test * improved model get_params for sequential layer * fix minor for params counting * commented broken ops tests * improved train for resnet	2021-06-21 09:37:24 -07:00
George Hotz	89798d2f43	some flags	2021-06-19 11:46:31 -07:00
George Hotz	d3f169b267	move good models to models, add a training step test	2021-06-19 11:24:15 -07:00
Jacky Lee	3a91d5434f	Add dropout test (#265 ) * Add dropout test * Remove condition where training is false * Skip dropout test when on GPU * Revert changes to tensor.py and fix test case * Revert change on whitespace * Convert Tensor to cpu for testing * Fix whitespace in tensor.py	2021-06-19 08:49:13 -07:00
George Hotz	2affd226b3	speed up sum	2021-06-17 16:38:34 -07:00
George Hotz	c1d469d440	sum op	2021-06-17 16:19:35 -07:00
George Hotz	2075fdeb4f	FPGA Based Accelerator for Tinygrad (#258 ) * ops_risk * risk sim * guessing is for winners * minor * better * matmal with risk * conv doesn't work * closer * conv2d works * ops_risk * opt2 works * opt1 may not be possible * opt1 is a mulacc * arty * attosoc example building on mac * minor * riscv assembler * gucci gang * we got C code * not a scam * hello * make risk mergeable into master * unop support	2021-06-07 17:45:09 -07:00
Skosh	81bf933a91	Improved __getitem__ (#254 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds… * Improved __getitem__ * Updated * Updated __getitem__ * Linebreaks * Maybe this works? * Added MNIST locally, tests run now	2021-05-05 22:15:22 -07:00
Skosh	78aa147b39	[WIP] YOLO working on tinygrad! (#245 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…	2021-04-25 18:06:52 -07:00
George Hotz	62e3a8558c	fix tolerance maybe	2021-01-05 07:45:47 -08:00
George Hotz	8a38e0d207	only mish failed	2021-01-03 09:47:11 -08:00
George Hotz	1a4487965a	remove negative from things w/o negative	2021-01-03 09:43:34 -08:00
George Hotz	0702e0c763	nah, no sign, it's not what you want. use relu	2021-01-03 09:30:33 -08:00
George Hotz	c2eeb6950b	add support for sign. technically relu can be second class now	2021-01-03 08:29:57 -08:00
NeuralLink	0825cf7f79	⚡ Added softplus and mish non stable (#220 ) * ⚡ Added softplus and mish CPU * 🔨 refactor * 🔨 second class softplus and mish * 🔨 test fix * no need of device in testing	2021-01-03 08:08:41 -08:00
Liam	ebd72ff437	Test split (#231 ) * Split tests Split tests into "Test CPU" and "Test GPU". Add test flag "TEST_DEVICES" which is a comma separated list of devices: CPU,GPU,ANE * Run tests based on provided TEST_DEVICES flag By default will run all "CPU,GPU,ANE" * fix bad quote * Revert changes and use GPU=1 This is done through setting the default Tensor Device to Device.CPU of GPU=1 is set. Run GPU tests: GPU=1 pytest -s -v	2021-01-01 09:19:03 -05:00
George Hotz	4291002881	reorder GPU ops	2020-12-31 09:46:39 -05:00
Marcel Bischoff	e2f833f58f	max to behave on ties like torch (#229 ) * checkpoint * fixing pow * undo pow * backward max on GPU and CPU rewrite * indentation * changing seed for curiosity * max replaced equality * undo seed * rebase * fixed tests * merge error	2020-12-30 18:52:50 -05:00
George Hotz	fcfe3dae01	write slice for CPU	2020-12-30 10:32:53 -05:00
George Hotz	f9170505b3	if you like your transformers twice as slow, use the GPU	2020-12-29 17:14:23 -05:00
George Hotz	6a6a82e999	support multidot on GPU	2020-12-29 16:56:30 -05:00
George Hotz	27208d729b	add GPU max thanks to marcelbischoff	2020-12-29 16:44:14 -05:00
George Hotz	02655c07d5	break maxpool2d on GPU	2020-12-29 13:05:57 -05:00
George Hotz	061e37de39	touchups	2020-12-29 12:41:21 -05:00
George Hotz	a2e6562330	fix max op, less lines	2020-12-29 10:47:04 -05:00
Marcel Bischoff	dc8fa7999c	Transpose on GPU (#221 ) * 2serious * load/save * fixing GPU * added DEBUG * needs BatchNorm or doesn't learn anything * old file not needed * added conv biases * added extra/training.py and checkpoint * assert in test only * save * padding * num_classes * checkpoint * checkpoints for padding * training was broken * merge * rotation augmentation * more aug * needs testing * streamline augment, augment is fast thus bicubic * tidying up * transformer eval * axis=-1 * transpose * test for permutation using torch.movedims * another test * line	2020-12-29 10:40:11 -05:00
George Hotz	36579f66bf	max op	2020-12-28 23:54:52 -05:00
George Hotz	fafece9db7	avgpool2d is a second class op	2020-12-28 10:41:59 -05:00
George Hotz	593233b668	log and exp are first class ops	2020-12-28 10:00:30 -05:00
George Hotz	a361ef6861	fixup training loop	2020-12-27 18:35:56 -05:00
George Hotz	f15bec6dbc	make multidot work on CPU	2020-12-27 17:25:37 -05:00
George Hotz	131e04c90c	cpu only decorator	2020-12-27 17:18:55 -05:00
George Hotz	2f1b2c0a3b	add transpose, start on transformer	2020-12-27 16:59:12 -05:00
iainwo	56d44637f3	fixed pylint, formatted python files iwth cblack on localhost (#204 ) * fixed pylint, formatted python files iwth cblack on localhost * Revert "fixed pylint, formatted python files iwth cblack on localhost" This reverts commit 07e2b88466fa53399ad78d962ffb2ad55bc45344. * dedented 4-spaces added linter Co-authored-by: Iain Wong <iainwong@outlook.com>	2020-12-17 14:37:31 -08:00
Liam	bcf1518309	All devices are equal! (#196 ) * Update all devices to be tested ANE, CPU and OCL all now support all tests. However tests are not currently passing on GPU and I cannot test on CPU. Failing GPU test are not an issue caused by this update. Tests have not been passing due to a missing "six" required installation. OpenCL Tests have not been run since commit: `1a1c63a08b` devices have 3 types and are handle by a new DeviceTypes enum. (The goal is to revert to Tensor.<type>, but this current setup allows for keyword argument defaults: `device=DeviceType.CPU`) All references to Tensor.GPU/CPU/ANE as been converted to the corresponding `DeviceTypes` enum. Refactor of the conversion code to allow for any device to any device conversion. * Add six dependency in requirements.txt * Resolve failure to run tests Move six into gpu required installs. Remove six from standard installation. * Remove repeated data conversion * Refactor method names Also reduce code with .to and .to_ * Dynamic device handlers * Refactor DeviceTypes -> Device * Add mem copy profiling back * test_backward_pass_diamond_model passing * Resolve Sum issue on GPU * Revert batchnorm2d tests * Update README with upadated API * ANE testing with * Last minute line gains	2020-12-15 23:44:08 -08:00
Marcel Bischoff	da72a0eed4	Big MNIST model with PIL augmentation and load/save (#160 ) * 2serious * load/save * fixing GPU * added DEBUG * needs BatchNorm or doesn't learn anything * old file not needed * added conv biases * added extra/training.py and checkpoint * assert in test only * save * padding * num_classes * checkpoint * checkpoints for padding * training was broken * merge * rotation augmentation * more aug * needs testing * streamline augment, augment is fast thus bicubic * tidying up	2020-12-13 20:45:55 -08:00
George Hotz	1d10559d1d	tinygrad.utils -> extra.utils	2020-12-12 15:26:07 -08:00
James Roberts	8e8cbc74b3	Minor clean up (#184 ) * Removes unused imports * Minor clean up	2020-12-11 14:25:29 -08:00
Daulet	c7e95ddb21	Add diamond model test (#181 ) * add backward pass test for diamond model * fix train_efficientnet example	2020-12-11 09:21:36 -08:00
Marcel Bischoff	5d46df638a	abs as non-first class operation using relu (#171 ) * abs (non-first class) * whitespace	2020-12-09 12:20:34 -08:00
George Hotz	ffb96b2d0b	batchnorm by marcelbischoff	2020-12-09 03:23:04 -08:00
NeuralLink	00e376f36c	leaky relu as geohot suggested (#167 )	2020-12-09 02:58:35 -08:00
George Hotz	c225e62dd2	touchups	2020-12-09 02:52:28 -08:00
Liam	89d0ff6989	Consistent testing (#137 ) * Consistent GPU classes Convert the existing GPU classes into one standard format. Remove duplicated functions in `test_mnist` and create a TestMNISTGPU class. This reduces line count and ensures consistency. Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to skip GPU testing. This will ensure that skipped tests are displayed accordingly in the pytest output. * Optim Testing now supports GPU * Tensor testing now supports GPU jacobian and gradcheck auto skipped until GPU float64 support added. * GPU support for custom constructor methods * Remove GPU flag from Model constructors It was requested that the `gpu` kwarg be removed from the model constructor. GPU conversion is now handled in the train function. This also required the conversion of Optimizer parameters as they are constructed prior to execution of the `train` function and are dependant on the model GPU state. * Fix typo: float32->float64 * Clean `get_parameters` utility Just a quick refactor w/ the new support for optimizers. * Remove GPU kwarg from TinyNet Remove `gpu` kwarg from tiny net to match test_mnist `train` function.	2020-12-09 02:25:27 -08:00
Daulet	24d688c184	win more lines for core library (#158 ) ...and sacrifice test speed	2020-12-08 14:18:45 -08:00
George Hotz	4e1a0de392	fix rsub	2020-12-08 10:05:21 -08:00
George Hotz	c4540f1b8c	Support scalars by kartik4949	2020-12-08 09:52:07 -08:00
George Hotz	97fd9c1237	zero_grad there to match readme	2020-12-07 23:12:18 -08:00
George Hotz	b355cd2571	Mean axis (doesn't work) (#154 ) * mean axis * fixed	2020-12-07 22:58:34 -08:00
Marcel Bischoff	58ccebd7cd	Sum with axis (#153 ) * sum with axis and tests * broken * works again * clean up * Update test_ops.py	2020-12-07 21:49:18 -08:00
George Hotz	3b982f2f7a	get_parameters	2020-12-06 13:47:28 -08:00
George Hotz	102e6356e9	replace layer_init_uniform with .uniform	2020-12-06 13:44:31 -08:00
George Hotz	51daaa43d4	fix memory leaks, add gc test	2020-12-06 10:34:40 -08:00
George Hotz	17659f7dd7	gpu speedup, tests work on M1	2020-12-06 09:05:49 -08:00
adamritter	f190ca446d	Detach (#123 ) * Detach * Torch.detach reuses the buffer in the * Fix test * wakey wakey GitHub Actions Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-19 19:03:42 -08:00
dustcollector12	ee99d016e9	tensor implementation for rmsprop and adam (#121 ) * tensor implementation for rmsprop and adam * test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu * number of steps reduced for adam from 1000 to 200	2020-11-16 15:07:49 -08:00
George Hotz	17bf90dbe4	unbroadcasting works on the GPU	2020-11-16 09:16:55 -08:00
George Hotz	17eab716b6	unbroadcast GPU template	2020-11-16 08:16:36 -08:00
George Hotz	13d34373d1	move gradcheck to extra, clean up unbroadcast	2020-11-16 08:03:31 -08:00
adamritter	5ea3d76dfb	Topological sort, zero_grads (#119 ) * Topological sort, zero_grads * Bug fix, add test * Add zero_grads * Put deepwalk function in backward * Move zero_grad to optim * Fix gradcheck hack Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-15 20:25:29 -08:00
Marcel Bischoff	c7b7f8ccc8	Backwards ops supporting broadcasting (#118 ) * streamlined numerical_jacobian * Got rid of the g loop in Conv2D.forward * ereased stupid line * nothing * no loops in Conv2D forward * Conv2D backprop improved * stupid things in examples * alternative to einsum * Conv2D backward einsum alternative * tidying up * tidied up * no ravel * got rid of print * Update efficientnet.py * Update efficientnet.py * Update efficientnet.py * only tensordot * 255.0 * whitespace * aspect ratio error in efficientnet * noprint * efficient net wrong strides * broadcasting for backward ops * Update ops.py * Update ops.py - was wrong * broadcast test for backward enabled * function adBC + not summing over already 1 axis * spacing Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>	2020-11-15 15:21:10 -08:00
dustcollector12	28474949b8	refactoring of forward in reshape (#115 ) * refactoring of forward in reshape * test case for reshape added	2020-11-13 13:20:43 -08:00
pb1729	420af82888	General broadcasting of binary operations (#114 ) * allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array * remove extra tabs Co-authored-by: phillip <phillip_bement@reedbement.com>	2020-11-12 22:27:48 -08:00
adamritter	08aa60d9d0	broadcasting 1s at the start, 1 kernel/4 divs version (#110 ) * Pad2d backward pass on GPU * Faster Pad2D GPU backward pass (no zeroing needed) * Fix out of bounds error * Don't save prg * Let compiler optimize division by 1 * More generic broadcasting (1s at the start) * Bug fix * Add comment * Try to fix flaky test with other method * Add mixed broadcast support * 1kernel * Separate broadcast tests Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-12 13:33:35 -08:00
NeuralLink	f773ef3996	⚡ tanh non first class op (#111 ) * ⚡ tanh non first class op * tanh test with 1e-6 tol Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>	2020-11-12 13:32:50 -08:00
Ryan Neph	608bdd4872	adds broadcasting test cases (#106 ) refs: #80, #90, #104, #105	2020-11-12 07:08:28 -08:00
adamritter	f1d21afe88	Somewhat more generic broadcasting (#105 ) * Somewhat more generic broadcasting * Add TODO * Set Torch to deterministic in test Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-11 20:33:00 -08:00
Ryan Neph	8827a536e0	GPU MaxPool2D.backward(); TinyConvNet train passes (#103 ) * no trailing whitespace * GPU MaxPool2D.backward(); TinyConvNet train passes! * Fix GPU avgpool.forward() init_val Doesn’t change result but is simpler. * Fix MaxPool GPU init_val Tests only cover random non-negative inputs. This fixes issues if negative inputs are fed to GPU MaxPool2D. Test update to follow.	2020-11-11 07:58:43 -08:00
George Hotz	d1284fa817	stride tests and i32	2020-11-10 16:10:14 -08:00
Marcel Bischoff	7bb803c5e0	Conv2D backward on GPU (#93 ) * to make it work locally * definitely not working * Conv2D GPU passes some of the tests * Conv2D GPU passes more of the tests * passes some tests and mnist * removed unecessary code * Conv2D Backpass works * wrong test_ops.py * white space + test backward * ereased useless code * removed default argument * long lines	2020-11-10 16:07:33 -08:00
George Hotz	52ee913c98	move the mnist loader out of tinygrad proper	2020-11-10 15:37:39 -08:00
George Hotz	58e703d099	fix tests	2020-11-10 09:49:19 -08:00
George Hotz	866b759d3b	match torch api for pad2d	2020-11-09 17:48:56 -08:00
Ryan Neph	16d564a53c	finish unsupporting strided pool, add global avg pool test (#92 )	2020-11-09 17:31:22 -08:00
George Hotz	870b84a893	test pad2d backward on GPU	2020-11-09 15:50:43 -08:00
George Hotz	e46d122f65	not supporting stride	2020-11-09 15:06:58 -08:00
Ryan Neph	c21c2a0b62	revert b0c0c5d: Strided Pool funcs (#74 ) (#87 ) Strided CPU Pooling was introduced but assumes small kernel size (<=(10,10)), but efficientnet.py feeds kernel_size=(112,112). This causes a huge array buffer allocation in stack_for_pool() that hangs inference for a long time or until system OOM. Revert CPU Pooling for now, and re-introduce #74 later with a new global-average-pooling op that can be used instead of avgpool2d with large kernel size for efficientnet inference. Co-authored-by: Ryan Neph <ryanneph@google.com>	2020-11-09 14:58:18 -08:00
Ryan Neph	7e515308a5	label op subtests by params (#83 )	2020-11-09 06:25:06 -08:00
Ryan Neph	5bedf566d1	tests should use rtol unless special case (#82 )	2020-11-08 17:25:11 -08:00
Ryan Neph	04b9312a34	Fix GPU Pooling bug at boundary + better Pooling test coverage (#81 ) * fixed Pooling bug * Clarify Pooling tests	2020-11-08 17:25:01 -08:00
Ryan Neph	b0c0c5d0d6	strided Pool funcs (#74 ) * Pool2D GPU forward supports stride kernel_size from ctx instead of saved_tensors * Pool2D CPU forward supports stride update ctx.stride properly	2020-11-08 11:45:55 -08:00
ziofil	db3eccc16b	implemented backward for Pad2D & test (#73 )	2020-11-07 21:58:42 -08:00
Ryan Neph	5265f6c578	add AvgPool2D backward pass on GPU (#68 )	2020-11-07 12:27:29 -08:00
George Hotz	30442a086a	some broadcasting, pool test is fail	2020-11-07 11:29:42 -08:00
George Hotz	94d44c97bf	add pad2d on GPU	2020-11-07 10:46:36 -08:00
George Hotz	fbff6ab2e5	fix strided convs, GPU env var for enet	2020-11-07 10:26:37 -08:00
George Hotz	ec03eb44bd	tinygrad does forward pass convs on GPU	2020-11-07 10:15:56 -08:00
George Hotz	bc7758cc5b	getting convs to work on gpu	2020-11-07 09:17:57 -08:00
George Hotz	3302286e68	yayay test_sgd_gpu passes	2020-11-07 08:48:17 -08:00
George Hotz	38e112cccd	logsoftmax test	2020-11-07 07:26:53 -08:00
Rene Delgado	cd54697fd8	fix gpu sum forward (#61 ) * ignore venv * add sum test * fix sum forward	2020-11-05 21:59:16 -08:00
NeuralLink	cc605da36d	Stable Sigmoid op (#59 ) * 🔨 Added stable sigmoid * ✅ added sigmoid test * 🔧 suppressed overflow warning * 🔧 clean up	2020-11-05 21:57:50 -08:00
George Hotz	f178d23ff3	gpu relu is good	2020-11-02 08:25:32 -08:00
George Hotz	231c1134bd	cute trick for GPU test	2020-11-02 08:17:17 -08:00
George Hotz	5201a8e89f	matmul on GPU	2020-11-01 08:54:20 -08:00
George Hotz	41e7d59aed	test dot	2020-11-01 07:51:35 -08:00
George Hotz	1f544d6ece	test mnist on GPU	2020-11-01 07:46:17 -08:00
George Hotz	9ac1ad40d6	Add GPU Support! (do not merge yet) (#41 ) * copy tensors to and from gpu * add on GPU * adding works * we stick shapes in * works on cpu and gpu * test changes, not passing yet * something else * op tests pass * add, mean, and sum have working forward/backward * mul ops test * no gpu support, no problem * test pass, clean up later * gpu cleanup * cleanup test ops, don't let div fail * revert more * aimpler dispatcher * clean up grad * GPU and * grad is a Tensor now * gate test on GPU * cleanups * late loading gpu * GPU as input option * last cleanups	2020-11-01 07:00:49 -08:00
George Hotz	2c7e75d733	group conv: forward pass works (#34 ) * forward pass works * got the backward pass * okay, it's now a coho	2020-10-30 09:19:20 -07:00
George Hotz	339a35b081	div needs help	2020-10-30 08:32:16 -07:00
George Hotz	c14473f87d	unit test for batchnorm2d	2020-10-30 08:19:58 -07:00
George Hotz	5e7e359706	fix tests	2020-10-29 08:19:07 -07:00
George Hotz	9ae3e9daf3	shape has to be a kwarg now, idk why this didn't break before	2020-10-29 08:13:05 -07:00
George Hotz	f84f6c1edd	write sqrt and div using pow	2020-10-29 07:57:25 -07:00
Göktuğ Karakaşlı	4b163ee270	efficient version of adam (#20 ) * counteracted bias initialization * test new adam * add optimizer tests * rename helper function names to fix the test * remove redundant import	2020-10-27 15:54:40 -07:00
George Hotz	f9788eba14	parameters, and start on efficientnet	2020-10-27 08:53:35 -07:00
George Hotz	1654008c1f	conv stride support	2020-10-26 08:54:43 -07:00
George Hotz	2a55d7402b	clean up ops, refactor pool backward. add stride test	2020-10-26 08:47:11 -07:00
George Hotz	93dceb4bee	fix kernel_size bug, name like torch, add test	2020-10-26 08:38:53 -07:00
Timothy Mc Alister	15e5988323	make default parameters work for functions	2020-10-26 12:43:36 +01:00
George Hotz	2d37fd686b	test ops	2020-10-25 19:03:49 -07:00
George Hotz	2eebbd32c6	ops test speed	2020-10-25 19:01:02 -07:00
George Hotz	b27bcbe4b4	avgpool and test refactor	2020-10-25 18:40:01 -07:00
George Hotz	4c42676cb6	400 -> 200	2020-10-25 17:19:59 -07:00
George Hotz	567707a5f6	rename max_pool2d to match torch, remove more fast conv crap	2020-10-25 17:16:47 -07:00
George Hotz	ea41f5e1c1	seems more generic	2020-10-25 16:40:37 -07:00
George Hotz	2333c4dea7	no tqdm in actions	2020-10-25 16:40:08 -07:00
George Hotz	ad48061927	better sort in torch profiler	2020-10-25 16:07:49 -07:00
George Hotz	82f8e10813	no hacks in that test	2020-10-25 15:52:05 -07:00
George Hotz	4baa4c041f	it's crazy how much faster pytorch is than numpy	2020-10-25 15:42:33 -07:00
George Hotz	5ddbd7f04b	2 to 3x slower than torch	2020-10-25 15:27:33 -07:00
George Hotz	f8311f5ecd	print fp/bp mnist	2020-10-25 15:08:18 -07:00
George Hotz	5c179d18ad	add profiling for mnist net	2020-10-25 14:20:55 -07:00
George Hotz	8fcada8071	faster and better convnet	2020-10-25 13:48:44 -07:00
George Hotz	96f9cdb8a0	woah, fastconv is wrong	2020-10-25 12:56:42 -07:00
George Hotz	bb98cdfef7	improve conv testing	2020-10-25 12:46:04 -07:00
George Hotz	ef24aac09e	finally, fast convs	2020-10-25 12:39:44 -07:00
George Hotz	67506eb6ba	fast im2col	2020-10-25 11:49:35 -07:00
George Hotz	c9968756d1	allow the line profiler to work	2020-10-25 11:13:40 -07:00
George Hotz	5062c2c8ff	profile conv better	2020-10-25 11:11:00 -07:00
George Hotz	c74764bac3	oops, set to None	2020-10-25 08:28:18 -07:00
George Hotz	935f5ddaaa	always keep batch size out front	2020-10-25 08:14:07 -07:00
George Hotz	b91fd3afad	maxpool	2020-10-25 07:43:34 -07:00
George Hotz	5216a1d9f3	refactor into tensor and ops	2020-10-23 10:34:21 -07:00
George Hotz	9b9e47f369	added conv profile test	2020-10-23 09:46:10 -07:00
George Hotz	5756115e57	anyone else let down by the fast conv?	2020-10-23 09:09:29 -07:00
George Hotz	bcb60e0b7c	wow, you have to name them test	2020-10-23 06:33:18 -07:00
George Hotz	2259c9faa1	low lr improves rmsprop	2020-10-23 06:22:32 -07:00
George Hotz	eda29fa0e0	clean up test	2020-10-23 06:11:38 -07:00
George Hotz	373b4e341b	Merge pull request #15 from f0ti/master added RMSprop optim	2020-10-23 06:08:20 -07:00
f0ti	0b87aaca1e	update rsmprop	2020-10-23 14:46:45 +02:00
f0ti	c5f726ec2e	all three	2020-10-23 11:53:01 +02:00
f0ti	6a38ccb6b0	update rmsprop and readme	2020-10-23 11:49:43 +02:00
George Hotz	21ebb0b769	if you wait 24 seconds, that gets 98%	2020-10-22 21:49:14 -07:00
George Hotz	816f648161	chans doesn't need to be in self	2020-10-22 21:19:35 -07:00
George Hotz	77251cc6c3	7x7 conv = more accuracy	2020-10-22 21:10:27 -07:00
f0ti	7e1eddb0c5	added RMSprop optim	2020-10-23 02:50:02 +02:00
0xNaN	d95adbddb4	`gradcheck` now returns only a bool, refactoring of test_gradcheck	2020-10-22 01:28:52 +02:00
0xNaN	adbfc67456	test `jacobian` and `numerical_jacobian` against torch.autograd.functional.jacobian	2020-10-22 01:28:52 +02:00
0xNaN	1561d3b9c0	extracting `jacobian` and `test_jacobian`	2020-10-22 01:28:52 +02:00
0xNaN	93bc3c22a0	tiny gradcheck	2020-10-22 01:28:52 +02:00
Adrian Garcia Badaracco	9a8be135a7	incorporate changes	2020-10-21 13:21:44 -05:00
Adrian Garcia Badaracco	02adb0ac3a	Make test_mnist runnable by pytest and directly	2020-10-21 11:30:08 -05:00
Adrian Garcia Badaracco	5afe6b1f68	rename files	2020-10-21 11:28:03 -05:00
George Hotz	d91902948b	add reshape support and OMG the CONVS are SO SLOW	2020-10-21 09:12:19 -07:00
George Hotz	e3110c9922	backward pass for conv2d, lol i mostly guessed and made shapes match	2020-10-21 08:45:35 -07:00
George Hotz	5c2ac48c11	write forward pass for convolution	2020-10-19 09:33:06 -07:00
George Hotz	2681c79bc5	simple tests, repr not str	2020-10-18 14:55:20 -07:00
George Hotz	4019c38942	more readme	2020-10-18 14:38:20 -07:00
George Hotz	cc9054e3ec	refactor into utils	2020-10-18 14:36:29 -07:00
George Hotz	0c3dd12b3b	i hate tabs	2020-10-18 14:33:13 -07:00
George Hotz	a139f34bb6	fix nll loss in example	2020-10-18 14:27:54 -07:00
George Hotz	26ce2d93c3	add support for adam	2020-10-18 13:50:23 -07:00
George Hotz	6532233d24	refactor better	2020-10-18 13:33:02 -07:00
George Hotz	92fd23df66	refactor into a few files	2020-10-18 13:30:25 -07:00
George Hotz	118c2eebe3	write sgd class	2020-10-18 13:27:59 -07:00
George Hotz	54eafe6c12	update readme	2020-10-18 13:08:14 -07:00
George Hotz	83417d4b4c	readme and dirs	2020-10-18 12:48:17 -07:00

... 49 50 51 52 53 ...

2686 Commits