Commit Graph

2686 Commits

Author SHA1 Message Date
George Hotz 46bbbcf7f0 model touchups 2021-11-30 11:13:34 -05:00
George Hotz bd21304e3c linear takes in weight and bias 2021-11-30 00:38:47 -05:00
George Hotz de938c2d9d vit is now tested 2021-11-30 00:23:06 -05:00
George Hotz 58ed46963e fix broadcastdot 2021-11-29 18:54:57 -05:00
George Hotz dca076dbf1 remove dumb nn ops 2021-11-29 18:05:31 -05:00
George Hotz f909ab194f gelu with broken test 2021-11-29 15:00:50 -05:00
George Hotz c752033283 fix GPU OOM in test 2021-11-29 13:05:59 -05:00
George Hotz 99b6051467 add ff_dim to transformer 2021-11-29 12:40:52 -05:00
George Hotz 29dee59368 cat: forward only not required 2021-11-29 00:14:56 -05:00
George Hotz 3cdc77f526 add cat support 2021-11-28 23:21:49 -05:00
George Hotz ce3d198bb7 less lines and fix default device 2021-11-27 11:18:49 -05:00
George Hotz 7ae14179d3 refactor ops 2021-11-27 11:12:23 -05:00
George Hotz c162e748f5 fix float64 warning on training 2021-10-30 20:07:31 -07:00
George Hotz b0f14b4af8 move datasets into datasets 2021-10-30 19:55:50 -07:00
George Hotz 7472a7ebe2 not forcing 3.9 for a stupid type 2021-10-30 16:52:40 -07:00
George Hotz fc6597a6d9 only resnet18, it's too slow otherwise 2021-10-30 16:48:39 -07:00
Evan Mays 285621aeda
Cherry backprop for conv2d (#281)
* quick math: 0 + x = x.

* gradient w.r.t. x using cherry for conv

* gradient w.r.t. w for conv on cherry but doing vector dot products

* small optimization

* [cherry] optimize conv backpass for large channel count

* get rid of numpy einsum
2021-10-30 16:12:19 -07:00
Sebastian Kreft 8113eec4cf
feat: add efficientnet test (#285)
Simple test using the Chicken example from https://upload.wikimedia.org/wikipedia/commons/4/41/Chicken.jpg and the image preprocessing from example/efficientnet.py

Note that EfficientNet loads the weights from the internet so running the tests may be slow the first time. We could speed up the tests by caching the /tmp folder.

Fixes #234
2021-10-30 15:53:51 -07:00
Guglielmo Camporese 2b7589db64
Added ResNet-{18, 34, 50, 101, 152} (#271)
* added resnets

* fix minor

* fix minor

* resnet in models

* added resnet test

* added resnet train test

* added linear, conv2d nn tests

* fix minor in extra/training

* resnet in models

* fix minor

* fix tolerance for linear in nn test

* fix eval, this causes cpu and gpu UT failing

* revert transformer test

* fix minor for CPU test

* improved model get_params for sequential layer

* fix minor for params counting

* commented broken ops tests

* improved train for resnet
2021-06-21 09:37:24 -07:00
George Hotz 89798d2f43 some flags 2021-06-19 11:46:31 -07:00
George Hotz d3f169b267 move good models to models, add a training step test 2021-06-19 11:24:15 -07:00
Jacky Lee 3a91d5434f
Add dropout test (#265)
* Add dropout test

* Remove condition where training is false

* Skip dropout test when on GPU

* Revert changes to tensor.py and fix test case

* Revert change on whitespace

* Convert Tensor to cpu for testing

* Fix whitespace in tensor.py
2021-06-19 08:49:13 -07:00
George Hotz 2affd226b3 speed up sum 2021-06-17 16:38:34 -07:00
George Hotz c1d469d440 sum op 2021-06-17 16:19:35 -07:00
George Hotz 2075fdeb4f
FPGA Based Accelerator for Tinygrad (#258)
* ops_risk

* risk sim

* guessing is for winners

* minor

* better

* matmal with risk

* conv doesn't work

* closer

* conv2d works

* ops_risk

* opt2 works

* opt1 may not be possible

* opt1 is a mulacc

* arty

* attosoc example building on mac

* minor

* riscv assembler

* gucci gang

* we got C code

* not a scam

* hello

* make risk mergeable into master

* unop support
2021-06-07 17:45:09 -07:00
Skosh 81bf933a91
Improved __getitem__ (#254)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…

* Improved __getitem__

* Updated

* Updated __getitem__

* Linebreaks

* Maybe this works?

* Added MNIST locally, tests run now
2021-05-05 22:15:22 -07:00
Skosh 78aa147b39
[WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
George Hotz 62e3a8558c fix tolerance maybe 2021-01-05 07:45:47 -08:00
George Hotz 8a38e0d207 only mish failed 2021-01-03 09:47:11 -08:00
George Hotz 1a4487965a remove negative from things w/o negative 2021-01-03 09:43:34 -08:00
George Hotz 0702e0c763 nah, no sign, it's not what you want. use relu 2021-01-03 09:30:33 -08:00
George Hotz c2eeb6950b add support for sign. technically relu can be second class now 2021-01-03 08:29:57 -08:00
NeuralLink 0825cf7f79
Added softplus and mish non stable (#220)
*  Added softplus and mish CPU

* 🔨 refactor

* 🔨 second class softplus and mish

* 🔨 test fix

* no need of device in testing
2021-01-03 08:08:41 -08:00
Liam ebd72ff437
Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
George Hotz 4291002881 reorder GPU ops 2020-12-31 09:46:39 -05:00
Marcel Bischoff e2f833f58f
max to behave on ties like torch (#229)
* checkpoint

* fixing pow

* undo pow

* backward max on GPU and CPU rewrite

* indentation

* changing seed for curiosity

* max replaced equality

* undo seed

* rebase

* fixed tests

* merge error
2020-12-30 18:52:50 -05:00
George Hotz fcfe3dae01 write slice for CPU 2020-12-30 10:32:53 -05:00
George Hotz f9170505b3 if you like your transformers twice as slow, use the GPU 2020-12-29 17:14:23 -05:00
George Hotz 6a6a82e999 support multidot on GPU 2020-12-29 16:56:30 -05:00
George Hotz 27208d729b add GPU max thanks to marcelbischoff 2020-12-29 16:44:14 -05:00
George Hotz 02655c07d5 break maxpool2d on GPU 2020-12-29 13:05:57 -05:00
George Hotz 061e37de39 touchups 2020-12-29 12:41:21 -05:00
George Hotz a2e6562330 fix max op, less lines 2020-12-29 10:47:04 -05:00
Marcel Bischoff dc8fa7999c
Transpose on GPU (#221)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up

* transformer eval

* axis=-1

* transpose

* test for permutation using torch.movedims

* another test

* line
2020-12-29 10:40:11 -05:00
George Hotz 36579f66bf max op 2020-12-28 23:54:52 -05:00
George Hotz fafece9db7 avgpool2d is a second class op 2020-12-28 10:41:59 -05:00
George Hotz 593233b668 log and exp are first class ops 2020-12-28 10:00:30 -05:00
George Hotz a361ef6861 fixup training loop 2020-12-27 18:35:56 -05:00
George Hotz f15bec6dbc make multidot work on CPU 2020-12-27 17:25:37 -05:00
George Hotz 131e04c90c cpu only decorator 2020-12-27 17:18:55 -05:00
George Hotz 2f1b2c0a3b add transpose, start on transformer 2020-12-27 16:59:12 -05:00
iainwo 56d44637f3
fixed pylint, formatted python files iwth cblack on localhost (#204)
* fixed pylint, formatted python files iwth cblack on localhost

* Revert "fixed pylint, formatted python files iwth cblack on localhost"

This reverts commit 07e2b88466fa53399ad78d962ffb2ad55bc45344.

* dedented 4-spaces added linter

Co-authored-by: Iain Wong <iainwong@outlook.com>
2020-12-17 14:37:31 -08:00
Liam bcf1518309
All devices are equal! (#196)
* Update all devices to be tested

ANE, CPU and OCL all now support all tests.

However tests are not currently passing on GPU and I cannot test on CPU.

Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.

OpenCL Tests have not been run since commit: 1a1c63a08b

devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)

All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.

Refactor of the conversion code to allow for any device to any device
conversion.

* Add six dependency in requirements.txt

* Resolve failure to run tests

Move six into gpu required installs. Remove six from standard
installation.

* Remove repeated data conversion

* Refactor method names

Also reduce code with .to and .to_

* Dynamic device handlers

* Refactor DeviceTypes -> Device

* Add mem copy profiling back

* test_backward_pass_diamond_model passing

* Resolve Sum issue on GPU

* Revert batchnorm2d tests

* Update README with upadated API

* ANE testing with

* Last minute line gains
2020-12-15 23:44:08 -08:00
Marcel Bischoff da72a0eed4
Big MNIST model with PIL augmentation and load/save (#160)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up
2020-12-13 20:45:55 -08:00
George Hotz 1d10559d1d tinygrad.utils -> extra.utils 2020-12-12 15:26:07 -08:00
James Roberts 8e8cbc74b3
Minor clean up (#184)
* Removes unused imports

* Minor clean up
2020-12-11 14:25:29 -08:00
Daulet c7e95ddb21
Add diamond model test (#181)
* add backward pass test for diamond model

* fix train_efficientnet example
2020-12-11 09:21:36 -08:00
Marcel Bischoff 5d46df638a
abs as non-first class operation using relu (#171)
* abs (non-first class)

* whitespace
2020-12-09 12:20:34 -08:00
George Hotz ffb96b2d0b batchnorm by marcelbischoff 2020-12-09 03:23:04 -08:00
NeuralLink 00e376f36c
leaky relu as geohot suggested (#167) 2020-12-09 02:58:35 -08:00
George Hotz c225e62dd2 touchups 2020-12-09 02:52:28 -08:00
Liam 89d0ff6989
Consistent testing (#137)
* Consistent GPU classes

Convert the existing GPU classes into one standard format.

Remove duplicated functions in `test_mnist` and create a TestMNISTGPU
class. This reduces line count and ensures consistency.

Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to
skip GPU testing. This will ensure that skipped tests are displayed
accordingly in the pytest output.

* Optim Testing now supports GPU

* Tensor testing now supports GPU

jacobian and gradcheck auto skipped until GPU float64 support added.

* GPU support for custom constructor methods

* Remove GPU flag from Model constructors

It was requested that the `gpu` kwarg be removed from the model
constructor. GPU conversion is now handled in the train function.

This also required the conversion of Optimizer parameters as they are
constructed prior to execution of the `train` function and are dependant
on the model GPU state.

* Fix typo: float32->float64

* Clean `get_parameters` utility

Just a quick refactor w/ the new support for optimizers.

* Remove GPU kwarg from TinyNet

Remove `gpu` kwarg from tiny net to match test_mnist `train` function.
2020-12-09 02:25:27 -08:00
Daulet 24d688c184
win more lines for core library (#158)
...and sacrifice test speed
2020-12-08 14:18:45 -08:00
George Hotz 4e1a0de392 fix rsub 2020-12-08 10:05:21 -08:00
George Hotz c4540f1b8c Support scalars by kartik4949 2020-12-08 09:52:07 -08:00
George Hotz 97fd9c1237 zero_grad there to match readme 2020-12-07 23:12:18 -08:00
George Hotz b355cd2571
Mean axis (doesn't work) (#154)
* mean axis

* fixed
2020-12-07 22:58:34 -08:00
Marcel Bischoff 58ccebd7cd
Sum with axis (#153)
* sum with axis and tests

* broken

* works again

* clean up

* Update test_ops.py
2020-12-07 21:49:18 -08:00
George Hotz 3b982f2f7a get_parameters 2020-12-06 13:47:28 -08:00
George Hotz 102e6356e9 replace layer_init_uniform with .uniform 2020-12-06 13:44:31 -08:00
George Hotz 51daaa43d4 fix memory leaks, add gc test 2020-12-06 10:34:40 -08:00
George Hotz 17659f7dd7 gpu speedup, tests work on M1 2020-12-06 09:05:49 -08:00
adamritter f190ca446d
Detach (#123)
* Detach

* Torch.detach reuses the buffer in the

* Fix test

* wakey wakey GitHub Actions

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-19 19:03:42 -08:00
dustcollector12 ee99d016e9
tensor implementation for rmsprop and adam (#121)
* tensor implementation for rmsprop and adam

* test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu

* number of steps reduced for adam from 1000 to 200
2020-11-16 15:07:49 -08:00
George Hotz 17bf90dbe4 unbroadcasting works on the GPU 2020-11-16 09:16:55 -08:00
George Hotz 17eab716b6 unbroadcast GPU template 2020-11-16 08:16:36 -08:00
George Hotz 13d34373d1 move gradcheck to extra, clean up unbroadcast 2020-11-16 08:03:31 -08:00
adamritter 5ea3d76dfb
Topological sort, zero_grads (#119)
* Topological sort, zero_grads

* Bug fix, add test

* Add zero_grads

* Put deepwalk function in backward

* Move zero_grad to optim

* Fix gradcheck hack

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-15 20:25:29 -08:00
Marcel Bischoff c7b7f8ccc8
Backwards ops supporting broadcasting (#118)
* streamlined numerical_jacobian

* Got rid of the g loop in Conv2D.forward

* ereased stupid line

* nothing

* no loops in Conv2D forward

* Conv2D backprop improved

* stupid things in examples

* alternative to einsum

* Conv2D backward einsum alternative

* tidying up

* tidied up

* no ravel

* got rid of print

* Update efficientnet.py

* Update efficientnet.py

* Update efficientnet.py

* only tensordot

* 255.0

* whitespace

* aspect ratio error in efficientnet

* noprint

* efficient net wrong strides

* broadcasting for backward ops

* Update ops.py

* Update ops.py

- was wrong

* broadcast test for backward enabled

* function adBC + not summing over already 1 axis

* spacing

Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>
2020-11-15 15:21:10 -08:00
dustcollector12 28474949b8
refactoring of forward in reshape (#115)
* refactoring of forward in reshape

* test case for reshape added
2020-11-13 13:20:43 -08:00
pb1729 420af82888
General broadcasting of binary operations (#114)
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array

* remove extra tabs

Co-authored-by: phillip <phillip_bement@reedbement.com>
2020-11-12 22:27:48 -08:00
adamritter 08aa60d9d0
broadcasting 1s at the start, 1 kernel/4 divs version (#110)
* Pad2d backward pass on GPU

* Faster Pad2D GPU backward pass (no zeroing needed)

* Fix out of bounds error

* Don't save prg

* Let compiler optimize division by 1

* More generic broadcasting (1s at the start)

* Bug fix

* Add comment

* Try to fix flaky test with other method

* Add mixed broadcast support

* 1kernel

* Separate broadcast tests

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-12 13:33:35 -08:00
NeuralLink f773ef3996
tanh non first class op (#111)
*  tanh non first class op

* tanh test with 1e-6 tol

Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>
2020-11-12 13:32:50 -08:00
Ryan Neph 608bdd4872
adds broadcasting test cases (#106)
refs: #80, #90, #104, #105
2020-11-12 07:08:28 -08:00
adamritter f1d21afe88
Somewhat more generic broadcasting (#105)
* Somewhat more generic broadcasting

* Add TODO

* Set Torch to deterministic in test

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-11 20:33:00 -08:00
Ryan Neph 8827a536e0
GPU MaxPool2D.backward(); TinyConvNet train passes (#103)
* no trailing whitespace

* GPU MaxPool2D.backward(); TinyConvNet train passes!

* Fix GPU avgpool.forward() init_val

Doesn’t change result but is simpler.

* Fix MaxPool GPU init_val

Tests only cover random non-negative inputs. This fixes issues if negative inputs are fed to GPU MaxPool2D. Test update to follow.
2020-11-11 07:58:43 -08:00
George Hotz d1284fa817 stride tests and i32 2020-11-10 16:10:14 -08:00
Marcel Bischoff 7bb803c5e0
Conv2D backward on GPU (#93)
* to make it work locally

* definitely not working

* Conv2D GPU passes some of the tests

* Conv2D GPU passes more of the tests

* passes some tests and mnist

* removed unecessary code

* Conv2D Backpass works

* wrong test_ops.py

* white space + test backward

* ereased useless code

* removed default argument

* long lines
2020-11-10 16:07:33 -08:00
George Hotz 52ee913c98 move the mnist loader out of tinygrad proper 2020-11-10 15:37:39 -08:00
George Hotz 58e703d099 fix tests 2020-11-10 09:49:19 -08:00
George Hotz 866b759d3b match torch api for pad2d 2020-11-09 17:48:56 -08:00
Ryan Neph 16d564a53c
finish unsupporting strided pool, add global avg pool test (#92) 2020-11-09 17:31:22 -08:00
George Hotz 870b84a893 test pad2d backward on GPU 2020-11-09 15:50:43 -08:00
George Hotz e46d122f65 not supporting stride 2020-11-09 15:06:58 -08:00
Ryan Neph c21c2a0b62
revert b0c0c5d: Strided Pool funcs (#74) (#87)
Strided CPU Pooling was introduced but assumes small kernel size
(<=(10,10)), but efficientnet.py feeds kernel_size=(112,112).

This causes a huge array buffer allocation in stack_for_pool() that
hangs inference for a long time or until system OOM.

Revert CPU Pooling for now, and re-introduce #74 later with a new
global-average-pooling op that can be used instead of avgpool2d with
large kernel size for efficientnet inference.

Co-authored-by: Ryan Neph <ryanneph@google.com>
2020-11-09 14:58:18 -08:00
Ryan Neph 7e515308a5
label op subtests by params (#83) 2020-11-09 06:25:06 -08:00
Ryan Neph 5bedf566d1
tests should use rtol unless special case (#82) 2020-11-08 17:25:11 -08:00
Ryan Neph 04b9312a34
Fix GPU Pooling bug at boundary + better Pooling test coverage (#81)
* fixed Pooling bug

* Clarify Pooling tests
2020-11-08 17:25:01 -08:00
Ryan Neph b0c0c5d0d6
strided Pool funcs (#74)
* *Pool2D GPU forward supports stride

* kernel_size from ctx instead of saved_tensors

* *Pool2D CPU forward supports stride

* update ctx.stride properly
2020-11-08 11:45:55 -08:00
ziofil db3eccc16b
implemented backward for Pad2D & test (#73) 2020-11-07 21:58:42 -08:00
Ryan Neph 5265f6c578
add AvgPool2D backward pass on GPU (#68) 2020-11-07 12:27:29 -08:00
George Hotz 30442a086a some broadcasting, pool test is fail 2020-11-07 11:29:42 -08:00
George Hotz 94d44c97bf add pad2d on GPU 2020-11-07 10:46:36 -08:00
George Hotz fbff6ab2e5 fix strided convs, GPU env var for enet 2020-11-07 10:26:37 -08:00
George Hotz ec03eb44bd tinygrad does forward pass convs on GPU 2020-11-07 10:15:56 -08:00
George Hotz bc7758cc5b getting convs to work on gpu 2020-11-07 09:17:57 -08:00
George Hotz 3302286e68 yayay test_sgd_gpu passes 2020-11-07 08:48:17 -08:00
George Hotz 38e112cccd logsoftmax test 2020-11-07 07:26:53 -08:00
Rene Delgado cd54697fd8
fix gpu sum forward (#61)
* ignore venv

* add sum test

* fix sum forward
2020-11-05 21:59:16 -08:00
NeuralLink cc605da36d
Stable Sigmoid op (#59)
* 🔨 Added stable sigmoid

*  added sigmoid test

* 🔧 suppressed overflow warning

* 🔧 clean up
2020-11-05 21:57:50 -08:00
George Hotz f178d23ff3 gpu relu is good 2020-11-02 08:25:32 -08:00
George Hotz 231c1134bd cute trick for GPU test 2020-11-02 08:17:17 -08:00
George Hotz 5201a8e89f matmul on GPU 2020-11-01 08:54:20 -08:00
George Hotz 41e7d59aed test dot 2020-11-01 07:51:35 -08:00
George Hotz 1f544d6ece test mnist on GPU 2020-11-01 07:46:17 -08:00
George Hotz 9ac1ad40d6
Add GPU Support! (do not merge yet) (#41)
* copy tensors to and from gpu

* add on GPU

* adding works

* we stick shapes in

* works on cpu and gpu

* test changes, not passing yet

* something else

* op tests pass

* add, mean, and sum have working forward/backward

* mul ops test

* no gpu support, no problem

* test pass, clean up later

* gpu cleanup

* cleanup test ops, don't let div fail

* revert more

* aimpler dispatcher

* clean up grad

* GPU and

* grad is a Tensor now

* gate test on GPU

* cleanups

* late loading gpu

* GPU as input option

* last cleanups
2020-11-01 07:00:49 -08:00
George Hotz 2c7e75d733
group conv: forward pass works (#34)
* forward pass works

* got the backward pass

* okay, it's now a coho
2020-10-30 09:19:20 -07:00
George Hotz 339a35b081 div needs help 2020-10-30 08:32:16 -07:00
George Hotz c14473f87d unit test for batchnorm2d 2020-10-30 08:19:58 -07:00
George Hotz 5e7e359706 fix tests 2020-10-29 08:19:07 -07:00
George Hotz 9ae3e9daf3 shape has to be a kwarg now, idk why this didn't break before 2020-10-29 08:13:05 -07:00
George Hotz f84f6c1edd write sqrt and div using pow 2020-10-29 07:57:25 -07:00
Göktuğ Karakaşlı 4b163ee270
efficient version of adam (#20)
* counteracted bias initialization

* test new adam

* add optimizer tests

* rename helper function names to fix the test

* remove redundant import
2020-10-27 15:54:40 -07:00
George Hotz f9788eba14 parameters, and start on efficientnet 2020-10-27 08:53:35 -07:00
George Hotz 1654008c1f conv stride support 2020-10-26 08:54:43 -07:00
George Hotz 2a55d7402b clean up ops, refactor pool backward. add stride test 2020-10-26 08:47:11 -07:00
George Hotz 93dceb4bee fix kernel_size bug, name like torch, add test 2020-10-26 08:38:53 -07:00
Timothy Mc Alister 15e5988323 make default parameters work for functions 2020-10-26 12:43:36 +01:00
George Hotz 2d37fd686b test ops 2020-10-25 19:03:49 -07:00
George Hotz 2eebbd32c6 ops test speed 2020-10-25 19:01:02 -07:00
George Hotz b27bcbe4b4 avgpool and test refactor 2020-10-25 18:40:01 -07:00
George Hotz 4c42676cb6 400 -> 200 2020-10-25 17:19:59 -07:00
George Hotz 567707a5f6 rename max_pool2d to match torch, remove more fast conv crap 2020-10-25 17:16:47 -07:00
George Hotz ea41f5e1c1 seems more generic 2020-10-25 16:40:37 -07:00
George Hotz 2333c4dea7 no tqdm in actions 2020-10-25 16:40:08 -07:00
George Hotz ad48061927 better sort in torch profiler 2020-10-25 16:07:49 -07:00
George Hotz 82f8e10813 no hacks in that test 2020-10-25 15:52:05 -07:00
George Hotz 4baa4c041f it's crazy how much faster pytorch is than numpy 2020-10-25 15:42:33 -07:00
George Hotz 5ddbd7f04b 2 to 3x slower than torch 2020-10-25 15:27:33 -07:00
George Hotz f8311f5ecd print fp/bp mnist 2020-10-25 15:08:18 -07:00
George Hotz 5c179d18ad add profiling for mnist net 2020-10-25 14:20:55 -07:00
George Hotz 8fcada8071 faster and better convnet 2020-10-25 13:48:44 -07:00
George Hotz 96f9cdb8a0 woah, fastconv is wrong 2020-10-25 12:56:42 -07:00
George Hotz bb98cdfef7 improve conv testing 2020-10-25 12:46:04 -07:00
George Hotz ef24aac09e finally, fast convs 2020-10-25 12:39:44 -07:00
George Hotz 67506eb6ba fast im2col 2020-10-25 11:49:35 -07:00
George Hotz c9968756d1 allow the line profiler to work 2020-10-25 11:13:40 -07:00
George Hotz 5062c2c8ff profile conv better 2020-10-25 11:11:00 -07:00
George Hotz c74764bac3 oops, set to None 2020-10-25 08:28:18 -07:00
George Hotz 935f5ddaaa always keep batch size out front 2020-10-25 08:14:07 -07:00
George Hotz b91fd3afad maxpool 2020-10-25 07:43:34 -07:00
George Hotz 5216a1d9f3 refactor into tensor and ops 2020-10-23 10:34:21 -07:00
George Hotz 9b9e47f369 added conv profile test 2020-10-23 09:46:10 -07:00
George Hotz 5756115e57 anyone else let down by the fast conv? 2020-10-23 09:09:29 -07:00
George Hotz bcb60e0b7c wow, you have to name them test 2020-10-23 06:33:18 -07:00
George Hotz 2259c9faa1 low lr improves rmsprop 2020-10-23 06:22:32 -07:00
George Hotz eda29fa0e0 clean up test 2020-10-23 06:11:38 -07:00
George Hotz 373b4e341b
Merge pull request #15 from f0ti/master
added RMSprop optim
2020-10-23 06:08:20 -07:00
f0ti 0b87aaca1e update rsmprop 2020-10-23 14:46:45 +02:00
f0ti c5f726ec2e all three 2020-10-23 11:53:01 +02:00
f0ti 6a38ccb6b0 update rmsprop and readme 2020-10-23 11:49:43 +02:00
George Hotz 21ebb0b769 if you wait 24 seconds, that gets 98% 2020-10-22 21:49:14 -07:00
George Hotz 816f648161 chans doesn't need to be in self 2020-10-22 21:19:35 -07:00
George Hotz 77251cc6c3 7x7 conv = more accuracy 2020-10-22 21:10:27 -07:00
f0ti 7e1eddb0c5 added RMSprop optim 2020-10-23 02:50:02 +02:00
0xNaN d95adbddb4 `gradcheck` now returns only a bool, refactoring of test_gradcheck 2020-10-22 01:28:52 +02:00
0xNaN adbfc67456 test `jacobian` and `numerical_jacobian` against torch.autograd.functional.jacobian 2020-10-22 01:28:52 +02:00
0xNaN 1561d3b9c0 extracting `jacobian` and `test_jacobian` 2020-10-22 01:28:52 +02:00
0xNaN 93bc3c22a0 tiny gradcheck 2020-10-22 01:28:52 +02:00
Adrian Garcia Badaracco 9a8be135a7
incorporate changes 2020-10-21 13:21:44 -05:00
Adrian Garcia Badaracco 02adb0ac3a
Make test_mnist runnable by pytest and directly 2020-10-21 11:30:08 -05:00
Adrian Garcia Badaracco 5afe6b1f68
rename files 2020-10-21 11:28:03 -05:00
George Hotz d91902948b add reshape support and OMG the CONVS are SO SLOW 2020-10-21 09:12:19 -07:00
George Hotz e3110c9922 backward pass for conv2d, lol i mostly guessed and made shapes match 2020-10-21 08:45:35 -07:00
George Hotz 5c2ac48c11 write forward pass for convolution 2020-10-19 09:33:06 -07:00
George Hotz 2681c79bc5 simple tests, repr not str 2020-10-18 14:55:20 -07:00
George Hotz 4019c38942 more readme 2020-10-18 14:38:20 -07:00
George Hotz cc9054e3ec refactor into utils 2020-10-18 14:36:29 -07:00
George Hotz 0c3dd12b3b i hate tabs 2020-10-18 14:33:13 -07:00
George Hotz a139f34bb6 fix nll loss in example 2020-10-18 14:27:54 -07:00
George Hotz 26ce2d93c3 add support for adam 2020-10-18 13:50:23 -07:00
George Hotz 6532233d24 refactor better 2020-10-18 13:33:02 -07:00
George Hotz 92fd23df66 refactor into a few files 2020-10-18 13:30:25 -07:00
George Hotz 118c2eebe3 write sgd class 2020-10-18 13:27:59 -07:00
George Hotz 54eafe6c12 update readme 2020-10-18 13:08:14 -07:00
George Hotz 83417d4b4c readme and dirs 2020-10-18 12:48:17 -07:00