Commit Graph

2686 Commits

Author SHA1 Message Date
George Hotz 73b0471b25 join expands 2022-07-17 13:42:05 -07:00
George Hotz cfabbbd6bb more crap to remove without convs 2022-07-17 13:02:27 -07:00
George Hotz 5e96ed523a fix opencl bug, no training on opencl 2022-07-17 12:55:26 -07:00
George Hotz f93e297804 fix bug caused by rounding 2022-07-17 12:49:58 -07:00
George Hotz cff297ef9d w/e, that's a later prob 2022-07-17 12:32:50 -07:00
George Hotz 6375e7129a opencl not imported 2022-07-17 12:14:39 -07:00
George Hotz bf299802f8 fixup tests 2022-07-17 12:11:53 -07:00
George Hotz 3c4565fa21 SLICE -> PAD,SHRINK 2022-07-17 11:33:59 -07:00
George Hotz cca089b11d Revert "more expand -> repeat"
This reverts commit 2e7b1630a8.
2022-07-17 08:41:48 -07:00
George Hotz 2e7b1630a8 more expand -> repeat 2022-07-17 08:40:49 -07:00
George Hotz d04b274cd2 noop removal can replace with reshape 2022-07-16 08:32:42 -07:00
George Hotz bcf422dfdd
Device2 (#358)
* option for matmul

* fixups

* fast like a nascar

* running

* thneed runner

* no buffer id makes no backing buffer

* move constant folding to the top

* runs on mac

* folded biases

* was v slow

* maybe just that

* elu touchup

* speed and float32

Co-authored-by: Comma Device <device@comma.ai>
2022-07-16 07:26:19 -07:00
George Hotz 5e46561f7e no_grad = NOT backward 2022-07-10 20:54:57 -07:00
George Hotz b34ae7876f lol chr(10) not chr(13) 2022-07-10 20:03:11 -07:00
George Hotz 44848ee5dc prints show we can precompute from the outside 2022-07-08 10:59:20 -07:00
George Hotz 04e7e4104c track graph children and make lazycache use weak references 2022-07-07 11:01:18 -07:00
George Hotz 001cfe83a2 local 2022-07-07 10:05:26 -07:00
George Hotz 2720ef49ca extra and test and tuple 2022-07-07 10:01:33 -07:00
George Hotz 81b73f97a3
Optiimzation (#355)
* constant folding into kernels

* that opt worth it?

* fix mypy

* ast one kernel

* save 2 lines in conv kernel

* debug print kernel count

* cl debugging

* early realize inputs

* refactor Device
2022-07-04 08:58:57 -07:00
George Hotz e6e43e820e should fix tests 2022-07-03 16:06:11 -07:00
George Hotz d7aad46758 test lazy also, make TestMNIST faster 2022-07-03 15:19:19 -07:00
George Hotz 93c378dffc add test for slice_one 2022-07-03 12:14:20 -07:00
George Hotz f9a8412b68 make contiguous ops yellow 2022-07-02 17:54:04 -07:00
George Hotz 207b9e1df3 padding is now a param to conv2d 2022-07-02 17:11:12 -07:00
George Hotz cde137d163 simple shapetracker tests 2022-07-02 16:02:15 -07:00
George Hotz 368c0ce2f6 NUM=-2 for ants 2022-07-02 15:47:10 -07:00
George Hotz 7276f8d6bf improve constant folding, detach before moving tensor 2022-07-02 15:29:40 -07:00
George Hotz e55a9833fb a little more readable 2022-06-27 08:54:04 -07:00
George Hotz 3a414d7f50 cleanup, add flops tracking 2022-06-26 22:43:39 -07:00
George Hotz dffde3de5a support both asymmetric and negative padding 2022-06-26 17:59:25 -07:00
George Hotz 49c954b389 comments 2022-06-26 17:20:25 -07:00
George Hotz 8c483fbdc9 maxpool lazy fix 2022-06-26 17:07:03 -07:00
George Hotz 98a730dd00 benchmark on different inputs 2022-06-21 20:20:58 -07:00
George Hotz 83d50e2687 move to extra.onnx 2022-06-21 19:43:44 -07:00
George Hotz c833886bf5 improved shapetracker 2022-06-21 19:17:25 -07:00
George Hotz 159a2d1a80
Simple Lazy (#340)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* and that

* cmp is fast

* 18ms on mac

* it's a lot of lines, but it's faster

* minor

* tests pass

* LoadOps.CONTIGUOUS

* remove dups

* torch converter doesn't support slice

* move lazy out for merge

* LoadOps are only for lazy
2022-06-20 22:45:11 -07:00
George Hotz a3538e225a
Simple Lazy Pieces (#343)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* mergable without this

* ops torch
2022-06-20 20:28:10 -07:00
George Hotz a7131b6a46
Non contig (#339)
* contiguous_view

* non contig reduce too

* conv fast

* maybe faster valid

* improve test_onnx

* improve params

* elementwise_op

* draw non contig

* improve contiguous
2022-06-19 22:40:48 -07:00
George Hotz d05e7c291a
contiguous_view (#336)
* contiguous_view

* non contig reduce too

* conv fast

* maybe faster valid

* improve test_onnx

* improve params

* elementwise_op

* draw non contig
2022-06-19 20:37:28 -07:00
George Hotz fb72ea3fbd
gpu uses shapetracker (fix tests) (#335)
* shapetracker

* movement_op

* hmm, that's why repr failed
2022-06-19 17:32:07 -07:00
George Hotz ce2e20b768 fix test 2022-06-19 17:07:09 -07:00
George Hotz 6b652dafb2 touchups 2022-06-19 16:57:14 -07:00
George Hotz e364849b3b stuff from lazy 2022-06-19 09:57:16 -07:00
George Hotz 8d08e41c21 print time in test 2022-06-19 00:59:09 -07:00
George Hotz 77f5cef8a6
First batch from lazy branch (#332)
* test and helpers from lazy

* lazy pt2
2022-06-18 17:26:59 -07:00
George Hotz a11deb5150 shapetracker check for noop 2022-06-16 16:29:18 -07:00
George Hotz 52505faaf4 minor 2022-06-16 15:53:45 -07:00
George Hotz d5b3e18540
Accelerate with CL (#325)
* accelerated opencl

* it's running, it's just wrong

* bugfix

* model is correct in opencl

* lazy image convert

* add padding support to convolution

* that stuff was all upstreamed

* remove HEAD

* oops

* test_simple_conv2d_4 passes, add dilation support

* put logic in ops_opencl

* fix crash

* hmm, stride seems okay

* padding for batched inputs

* just an issue now with cout%4

* op model still passes

* fix startPackedInputChannel

* pre and post processing ops for graph

* don't break other llops

* shapetrackering

* reshapes are free

* lazy movement ops
2022-06-16 15:40:52 -07:00
George Hotz bd7068f635 fix tests hopefully 2022-06-16 14:07:37 -07:00
George Hotz ce15bf2bdb the big memory gradient didn't even need to be computed 2022-06-16 11:41:29 -07:00
George Hotz 2e58948f6a Revert "can put that test back"
This reverts commit 51b082b41a.
2022-06-16 11:25:49 -07:00
George Hotz 51b082b41a can put that test back 2022-06-16 11:18:14 -07:00
George Hotz 85fe25e27b add stride support to shapetracker 2022-06-15 17:48:41 -07:00
George Hotz 3d4657167b fix tests hopefully 2022-06-15 17:26:37 -07:00
George Hotz 2a14befb74 support padding 2022-06-15 14:46:44 -07:00
George Hotz fef6c82491 wow dilation support was simple 2022-06-15 11:38:23 -07:00
George Hotz 0b182029dd support dilated convolution in torch 2022-06-14 18:03:35 -07:00
George Hotz a690ba4588 add test for padding 2022-06-14 17:41:22 -07:00
George Hotz e057ca23bb add flip 2022-06-14 17:28:43 -07:00
George Hotz 6261a0639b
ShapeTracker (#328)
* start shapetracker

* that late reshape is crushing our hopes

* simple failure

* DumbShapeTracker passes tests

* improve st tests

* stacked view tracker works

* flip works

* tests pass

* shapetracker works

* use ShapeTracker in ops_gpu

* a couple lines

* fix 0 shape

* less lines

* use shapetracker for new_shape in ops.py

* simpler still

* padding with a ZeroView

* gamed it a little
2022-06-14 16:08:22 -07:00
George Hotz dcbca4fdf1
Expand Operator (#327)
* replace broadcasting with expand

* Tensor, not self

* remove broadcasting from mlops

* delete useless A operator

* expand, not repeat

* remove A op

* expand on gpu

* binary_op doesn't broadcast anymore

* expand is still total junk, but the tests should pass
2022-06-12 12:31:48 -07:00
George Hotz 33f18c61a1 test_broadcasted_add 2022-06-12 10:19:58 -07:00
George Hotz af300b121b refactor to pass conv args into llops 2022-06-11 23:08:46 -07:00
George Hotz d747a4b9e2 add padding to conv2d function, other minor things 2022-06-11 22:29:42 -07:00
George Hotz 9a3c048724 skip broken tests, no float64 allowed 2022-06-11 17:12:04 -07:00
George Hotz 9ebd472375 move ops to ops.py 2022-06-11 15:58:56 -07:00
George Hotz b5b68e75ff simpler onnx 2022-06-11 15:35:45 -07:00
George Hotz 2305a5347b test_onnx works with enet also 2022-06-11 14:30:26 -07:00
George Hotz 6fdb276886 flip batchnorm function order 2022-06-11 13:20:41 -07:00
George Hotz 85d17a2acd running resnet onnx 2022-06-11 13:17:15 -07:00
George Hotz 0225360191 fixed with one return x 2022-06-11 12:08:53 -07:00
George Hotz db5a632e8c multicat + test onnx is generic onnx 2022-06-11 11:50:47 -07:00
George Hotz a710b3a210 it's a real test now 2022-06-11 11:33:33 -07:00
George Hotz 8440dbfa5d support inputs 2022-06-11 11:21:45 -07:00
George Hotz 08de1aa636 add flatten to tinygrad 2022-06-11 11:15:16 -07:00
George Hotz aee251cc41 op model test 2022-06-11 11:06:03 -07:00
George Hotz d061ce8d5e add ELU support 2022-06-11 10:47:23 -07:00
George Hotz 8864b37333 fix torch convdw 2022-06-10 15:04:39 -07:00
George Hotz aac1a9b419 this breaks tests 2022-06-10 12:20:42 -07:00
George Hotz e01ed64d7c restore that naming 2022-06-09 08:38:34 -07:00
George Hotz 60a48455ad still over line count, maybe test pass 2022-06-08 09:51:28 -07:00
George Hotz 70561f3d90 way over the line limit 2022-06-08 09:36:31 -07:00
George Hotz 4f7ee235c5 not a real test now 2022-06-08 09:00:59 -07:00
George Hotz ae33060dae early float4 stuff for binary 2022-06-08 08:59:54 -07:00
George Hotz 82f29b5dbf better GPU block 2022-06-08 08:01:04 -07:00
George Hotz 42ae78241e only run test on GPU 2022-06-08 07:54:40 -07:00
George Hotz cdf4b5f142 opencl perf test 2022-06-08 07:49:08 -07:00
George Hotz d8ee8a39ac sgd threestep graph is so pretty 2022-06-06 09:45:37 -07:00
George Hotz c143c92828 adam threestep 2022-06-06 09:38:28 -07:00
George Hotz d302049e53 don't use div 2022-06-06 09:25:31 -07:00
George Hotz a1dff4061b minor cleanups 2022-06-06 08:14:52 -07:00
George Hotz 3dac8fa728 this fix the gc 2022-06-05 17:16:40 -07:00
George Hotz 0ee21ba115 add ViT test and car 2022-06-05 17:12:43 -07:00
George Hotz 1de75b67d5 fix bug in graph with use of id 2022-06-05 16:31:20 -07:00
George Hotz f0fe37bd34 simpler graph demo 2022-06-05 12:40:12 -07:00
George Hotz 88de42fb6e document graph mode 2022-06-05 12:13:05 -07:00
George Hotz 845bb1fc34 bs 4 -> 2 in training test 2022-01-15 21:34:21 -08:00
George Hotz c0d1254003 don't run unneeded grads 2022-01-15 21:32:13 -08:00
George Hotz 8ba3d1f803 fix bn test, affine is True 2022-01-15 19:52:15 -08:00
George Hotz e28cdfb0cf clean up resnet 2021-11-30 16:14:54 -05:00
George Hotz 46bbbcf7f0 model touchups 2021-11-30 11:13:34 -05:00
George Hotz bd21304e3c linear takes in weight and bias 2021-11-30 00:38:47 -05:00
George Hotz de938c2d9d vit is now tested 2021-11-30 00:23:06 -05:00
George Hotz 58ed46963e fix broadcastdot 2021-11-29 18:54:57 -05:00
George Hotz dca076dbf1 remove dumb nn ops 2021-11-29 18:05:31 -05:00
George Hotz f909ab194f gelu with broken test 2021-11-29 15:00:50 -05:00
George Hotz c752033283 fix GPU OOM in test 2021-11-29 13:05:59 -05:00
George Hotz 99b6051467 add ff_dim to transformer 2021-11-29 12:40:52 -05:00
George Hotz 29dee59368 cat: forward only not required 2021-11-29 00:14:56 -05:00
George Hotz 3cdc77f526 add cat support 2021-11-28 23:21:49 -05:00
George Hotz ce3d198bb7 less lines and fix default device 2021-11-27 11:18:49 -05:00
George Hotz 7ae14179d3 refactor ops 2021-11-27 11:12:23 -05:00
George Hotz c162e748f5 fix float64 warning on training 2021-10-30 20:07:31 -07:00
George Hotz b0f14b4af8 move datasets into datasets 2021-10-30 19:55:50 -07:00
George Hotz 7472a7ebe2 not forcing 3.9 for a stupid type 2021-10-30 16:52:40 -07:00
George Hotz fc6597a6d9 only resnet18, it's too slow otherwise 2021-10-30 16:48:39 -07:00
Evan Mays 285621aeda
Cherry backprop for conv2d (#281)
* quick math: 0 + x = x.

* gradient w.r.t. x using cherry for conv

* gradient w.r.t. w for conv on cherry but doing vector dot products

* small optimization

* [cherry] optimize conv backpass for large channel count

* get rid of numpy einsum
2021-10-30 16:12:19 -07:00
Sebastian Kreft 8113eec4cf
feat: add efficientnet test (#285)
Simple test using the Chicken example from https://upload.wikimedia.org/wikipedia/commons/4/41/Chicken.jpg and the image preprocessing from example/efficientnet.py

Note that EfficientNet loads the weights from the internet so running the tests may be slow the first time. We could speed up the tests by caching the /tmp folder.

Fixes #234
2021-10-30 15:53:51 -07:00
Guglielmo Camporese 2b7589db64
Added ResNet-{18, 34, 50, 101, 152} (#271)
* added resnets

* fix minor

* fix minor

* resnet in models

* added resnet test

* added resnet train test

* added linear, conv2d nn tests

* fix minor in extra/training

* resnet in models

* fix minor

* fix tolerance for linear in nn test

* fix eval, this causes cpu and gpu UT failing

* revert transformer test

* fix minor for CPU test

* improved model get_params for sequential layer

* fix minor for params counting

* commented broken ops tests

* improved train for resnet
2021-06-21 09:37:24 -07:00
George Hotz 89798d2f43 some flags 2021-06-19 11:46:31 -07:00
George Hotz d3f169b267 move good models to models, add a training step test 2021-06-19 11:24:15 -07:00
Jacky Lee 3a91d5434f
Add dropout test (#265)
* Add dropout test

* Remove condition where training is false

* Skip dropout test when on GPU

* Revert changes to tensor.py and fix test case

* Revert change on whitespace

* Convert Tensor to cpu for testing

* Fix whitespace in tensor.py
2021-06-19 08:49:13 -07:00
George Hotz 2affd226b3 speed up sum 2021-06-17 16:38:34 -07:00
George Hotz c1d469d440 sum op 2021-06-17 16:19:35 -07:00
George Hotz 2075fdeb4f
FPGA Based Accelerator for Tinygrad (#258)
* ops_risk

* risk sim

* guessing is for winners

* minor

* better

* matmal with risk

* conv doesn't work

* closer

* conv2d works

* ops_risk

* opt2 works

* opt1 may not be possible

* opt1 is a mulacc

* arty

* attosoc example building on mac

* minor

* riscv assembler

* gucci gang

* we got C code

* not a scam

* hello

* make risk mergeable into master

* unop support
2021-06-07 17:45:09 -07:00
Skosh 81bf933a91
Improved __getitem__ (#254)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…

* Improved __getitem__

* Updated

* Updated __getitem__

* Linebreaks

* Maybe this works?

* Added MNIST locally, tests run now
2021-05-05 22:15:22 -07:00
Skosh 78aa147b39
[WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
George Hotz 62e3a8558c fix tolerance maybe 2021-01-05 07:45:47 -08:00
George Hotz 8a38e0d207 only mish failed 2021-01-03 09:47:11 -08:00
George Hotz 1a4487965a remove negative from things w/o negative 2021-01-03 09:43:34 -08:00
George Hotz 0702e0c763 nah, no sign, it's not what you want. use relu 2021-01-03 09:30:33 -08:00
George Hotz c2eeb6950b add support for sign. technically relu can be second class now 2021-01-03 08:29:57 -08:00
NeuralLink 0825cf7f79
Added softplus and mish non stable (#220)
*  Added softplus and mish CPU

* 🔨 refactor

* 🔨 second class softplus and mish

* 🔨 test fix

* no need of device in testing
2021-01-03 08:08:41 -08:00
Liam ebd72ff437
Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
George Hotz 4291002881 reorder GPU ops 2020-12-31 09:46:39 -05:00
Marcel Bischoff e2f833f58f
max to behave on ties like torch (#229)
* checkpoint

* fixing pow

* undo pow

* backward max on GPU and CPU rewrite

* indentation

* changing seed for curiosity

* max replaced equality

* undo seed

* rebase

* fixed tests

* merge error
2020-12-30 18:52:50 -05:00
George Hotz fcfe3dae01 write slice for CPU 2020-12-30 10:32:53 -05:00
George Hotz f9170505b3 if you like your transformers twice as slow, use the GPU 2020-12-29 17:14:23 -05:00
George Hotz 6a6a82e999 support multidot on GPU 2020-12-29 16:56:30 -05:00
George Hotz 27208d729b add GPU max thanks to marcelbischoff 2020-12-29 16:44:14 -05:00
George Hotz 02655c07d5 break maxpool2d on GPU 2020-12-29 13:05:57 -05:00
George Hotz 061e37de39 touchups 2020-12-29 12:41:21 -05:00
George Hotz a2e6562330 fix max op, less lines 2020-12-29 10:47:04 -05:00
Marcel Bischoff dc8fa7999c
Transpose on GPU (#221)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up

* transformer eval

* axis=-1

* transpose

* test for permutation using torch.movedims

* another test

* line
2020-12-29 10:40:11 -05:00
George Hotz 36579f66bf max op 2020-12-28 23:54:52 -05:00
George Hotz fafece9db7 avgpool2d is a second class op 2020-12-28 10:41:59 -05:00
George Hotz 593233b668 log and exp are first class ops 2020-12-28 10:00:30 -05:00
George Hotz a361ef6861 fixup training loop 2020-12-27 18:35:56 -05:00
George Hotz f15bec6dbc make multidot work on CPU 2020-12-27 17:25:37 -05:00
George Hotz 131e04c90c cpu only decorator 2020-12-27 17:18:55 -05:00
George Hotz 2f1b2c0a3b add transpose, start on transformer 2020-12-27 16:59:12 -05:00
iainwo 56d44637f3
fixed pylint, formatted python files iwth cblack on localhost (#204)
* fixed pylint, formatted python files iwth cblack on localhost

* Revert "fixed pylint, formatted python files iwth cblack on localhost"

This reverts commit 07e2b88466fa53399ad78d962ffb2ad55bc45344.

* dedented 4-spaces added linter

Co-authored-by: Iain Wong <iainwong@outlook.com>
2020-12-17 14:37:31 -08:00
Liam bcf1518309
All devices are equal! (#196)
* Update all devices to be tested

ANE, CPU and OCL all now support all tests.

However tests are not currently passing on GPU and I cannot test on CPU.

Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.

OpenCL Tests have not been run since commit: 1a1c63a08b

devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)

All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.

Refactor of the conversion code to allow for any device to any device
conversion.

* Add six dependency in requirements.txt

* Resolve failure to run tests

Move six into gpu required installs. Remove six from standard
installation.

* Remove repeated data conversion

* Refactor method names

Also reduce code with .to and .to_

* Dynamic device handlers

* Refactor DeviceTypes -> Device

* Add mem copy profiling back

* test_backward_pass_diamond_model passing

* Resolve Sum issue on GPU

* Revert batchnorm2d tests

* Update README with upadated API

* ANE testing with

* Last minute line gains
2020-12-15 23:44:08 -08:00
Marcel Bischoff da72a0eed4
Big MNIST model with PIL augmentation and load/save (#160)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up
2020-12-13 20:45:55 -08:00
George Hotz 1d10559d1d tinygrad.utils -> extra.utils 2020-12-12 15:26:07 -08:00
James Roberts 8e8cbc74b3
Minor clean up (#184)
* Removes unused imports

* Minor clean up
2020-12-11 14:25:29 -08:00
Daulet c7e95ddb21
Add diamond model test (#181)
* add backward pass test for diamond model

* fix train_efficientnet example
2020-12-11 09:21:36 -08:00
Marcel Bischoff 5d46df638a
abs as non-first class operation using relu (#171)
* abs (non-first class)

* whitespace
2020-12-09 12:20:34 -08:00
George Hotz ffb96b2d0b batchnorm by marcelbischoff 2020-12-09 03:23:04 -08:00
NeuralLink 00e376f36c
leaky relu as geohot suggested (#167) 2020-12-09 02:58:35 -08:00
George Hotz c225e62dd2 touchups 2020-12-09 02:52:28 -08:00
Liam 89d0ff6989
Consistent testing (#137)
* Consistent GPU classes

Convert the existing GPU classes into one standard format.

Remove duplicated functions in `test_mnist` and create a TestMNISTGPU
class. This reduces line count and ensures consistency.

Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to
skip GPU testing. This will ensure that skipped tests are displayed
accordingly in the pytest output.

* Optim Testing now supports GPU

* Tensor testing now supports GPU

jacobian and gradcheck auto skipped until GPU float64 support added.

* GPU support for custom constructor methods

* Remove GPU flag from Model constructors

It was requested that the `gpu` kwarg be removed from the model
constructor. GPU conversion is now handled in the train function.

This also required the conversion of Optimizer parameters as they are
constructed prior to execution of the `train` function and are dependant
on the model GPU state.

* Fix typo: float32->float64

* Clean `get_parameters` utility

Just a quick refactor w/ the new support for optimizers.

* Remove GPU kwarg from TinyNet

Remove `gpu` kwarg from tiny net to match test_mnist `train` function.
2020-12-09 02:25:27 -08:00
Daulet 24d688c184
win more lines for core library (#158)
...and sacrifice test speed
2020-12-08 14:18:45 -08:00
George Hotz 4e1a0de392 fix rsub 2020-12-08 10:05:21 -08:00
George Hotz c4540f1b8c Support scalars by kartik4949 2020-12-08 09:52:07 -08:00
George Hotz 97fd9c1237 zero_grad there to match readme 2020-12-07 23:12:18 -08:00
George Hotz b355cd2571
Mean axis (doesn't work) (#154)
* mean axis

* fixed
2020-12-07 22:58:34 -08:00
Marcel Bischoff 58ccebd7cd
Sum with axis (#153)
* sum with axis and tests

* broken

* works again

* clean up

* Update test_ops.py
2020-12-07 21:49:18 -08:00
George Hotz 3b982f2f7a get_parameters 2020-12-06 13:47:28 -08:00
George Hotz 102e6356e9 replace layer_init_uniform with .uniform 2020-12-06 13:44:31 -08:00
George Hotz 51daaa43d4 fix memory leaks, add gc test 2020-12-06 10:34:40 -08:00
George Hotz 17659f7dd7 gpu speedup, tests work on M1 2020-12-06 09:05:49 -08:00
adamritter f190ca446d
Detach (#123)
* Detach

* Torch.detach reuses the buffer in the

* Fix test

* wakey wakey GitHub Actions

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-19 19:03:42 -08:00
dustcollector12 ee99d016e9
tensor implementation for rmsprop and adam (#121)
* tensor implementation for rmsprop and adam

* test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu

* number of steps reduced for adam from 1000 to 200
2020-11-16 15:07:49 -08:00
George Hotz 17bf90dbe4 unbroadcasting works on the GPU 2020-11-16 09:16:55 -08:00
George Hotz 17eab716b6 unbroadcast GPU template 2020-11-16 08:16:36 -08:00
George Hotz 13d34373d1 move gradcheck to extra, clean up unbroadcast 2020-11-16 08:03:31 -08:00
adamritter 5ea3d76dfb
Topological sort, zero_grads (#119)
* Topological sort, zero_grads

* Bug fix, add test

* Add zero_grads

* Put deepwalk function in backward

* Move zero_grad to optim

* Fix gradcheck hack

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-15 20:25:29 -08:00
Marcel Bischoff c7b7f8ccc8
Backwards ops supporting broadcasting (#118)
* streamlined numerical_jacobian

* Got rid of the g loop in Conv2D.forward

* ereased stupid line

* nothing

* no loops in Conv2D forward

* Conv2D backprop improved

* stupid things in examples

* alternative to einsum

* Conv2D backward einsum alternative

* tidying up

* tidied up

* no ravel

* got rid of print

* Update efficientnet.py

* Update efficientnet.py

* Update efficientnet.py

* only tensordot

* 255.0

* whitespace

* aspect ratio error in efficientnet

* noprint

* efficient net wrong strides

* broadcasting for backward ops

* Update ops.py

* Update ops.py

- was wrong

* broadcast test for backward enabled

* function adBC + not summing over already 1 axis

* spacing

Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>
2020-11-15 15:21:10 -08:00
dustcollector12 28474949b8
refactoring of forward in reshape (#115)
* refactoring of forward in reshape

* test case for reshape added
2020-11-13 13:20:43 -08:00
pb1729 420af82888
General broadcasting of binary operations (#114)
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array

* remove extra tabs

Co-authored-by: phillip <phillip_bement@reedbement.com>
2020-11-12 22:27:48 -08:00
adamritter 08aa60d9d0
broadcasting 1s at the start, 1 kernel/4 divs version (#110)
* Pad2d backward pass on GPU

* Faster Pad2D GPU backward pass (no zeroing needed)

* Fix out of bounds error

* Don't save prg

* Let compiler optimize division by 1

* More generic broadcasting (1s at the start)

* Bug fix

* Add comment

* Try to fix flaky test with other method

* Add mixed broadcast support

* 1kernel

* Separate broadcast tests

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-12 13:33:35 -08:00
NeuralLink f773ef3996
tanh non first class op (#111)
*  tanh non first class op

* tanh test with 1e-6 tol

Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>
2020-11-12 13:32:50 -08:00
Ryan Neph 608bdd4872
adds broadcasting test cases (#106)
refs: #80, #90, #104, #105
2020-11-12 07:08:28 -08:00
adamritter f1d21afe88
Somewhat more generic broadcasting (#105)
* Somewhat more generic broadcasting

* Add TODO

* Set Torch to deterministic in test

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-11 20:33:00 -08:00
Ryan Neph 8827a536e0
GPU MaxPool2D.backward(); TinyConvNet train passes (#103)
* no trailing whitespace

* GPU MaxPool2D.backward(); TinyConvNet train passes!

* Fix GPU avgpool.forward() init_val

Doesn’t change result but is simpler.

* Fix MaxPool GPU init_val

Tests only cover random non-negative inputs. This fixes issues if negative inputs are fed to GPU MaxPool2D. Test update to follow.
2020-11-11 07:58:43 -08:00
George Hotz d1284fa817 stride tests and i32 2020-11-10 16:10:14 -08:00
Marcel Bischoff 7bb803c5e0
Conv2D backward on GPU (#93)
* to make it work locally

* definitely not working

* Conv2D GPU passes some of the tests

* Conv2D GPU passes more of the tests

* passes some tests and mnist

* removed unecessary code

* Conv2D Backpass works

* wrong test_ops.py

* white space + test backward

* ereased useless code

* removed default argument

* long lines
2020-11-10 16:07:33 -08:00
George Hotz 52ee913c98 move the mnist loader out of tinygrad proper 2020-11-10 15:37:39 -08:00
George Hotz 58e703d099 fix tests 2020-11-10 09:49:19 -08:00
George Hotz 866b759d3b match torch api for pad2d 2020-11-09 17:48:56 -08:00
Ryan Neph 16d564a53c
finish unsupporting strided pool, add global avg pool test (#92) 2020-11-09 17:31:22 -08:00
George Hotz 870b84a893 test pad2d backward on GPU 2020-11-09 15:50:43 -08:00
George Hotz e46d122f65 not supporting stride 2020-11-09 15:06:58 -08:00
Ryan Neph c21c2a0b62
revert b0c0c5d: Strided Pool funcs (#74) (#87)
Strided CPU Pooling was introduced but assumes small kernel size
(<=(10,10)), but efficientnet.py feeds kernel_size=(112,112).

This causes a huge array buffer allocation in stack_for_pool() that
hangs inference for a long time or until system OOM.

Revert CPU Pooling for now, and re-introduce #74 later with a new
global-average-pooling op that can be used instead of avgpool2d with
large kernel size for efficientnet inference.

Co-authored-by: Ryan Neph <ryanneph@google.com>
2020-11-09 14:58:18 -08:00
Ryan Neph 7e515308a5
label op subtests by params (#83) 2020-11-09 06:25:06 -08:00
Ryan Neph 5bedf566d1
tests should use rtol unless special case (#82) 2020-11-08 17:25:11 -08:00
Ryan Neph 04b9312a34
Fix GPU Pooling bug at boundary + better Pooling test coverage (#81)
* fixed Pooling bug

* Clarify Pooling tests
2020-11-08 17:25:01 -08:00
Ryan Neph b0c0c5d0d6
strided Pool funcs (#74)
* *Pool2D GPU forward supports stride

* kernel_size from ctx instead of saved_tensors

* *Pool2D CPU forward supports stride

* update ctx.stride properly
2020-11-08 11:45:55 -08:00
ziofil db3eccc16b
implemented backward for Pad2D & test (#73) 2020-11-07 21:58:42 -08:00
Ryan Neph 5265f6c578
add AvgPool2D backward pass on GPU (#68) 2020-11-07 12:27:29 -08:00
George Hotz 30442a086a some broadcasting, pool test is fail 2020-11-07 11:29:42 -08:00
George Hotz 94d44c97bf add pad2d on GPU 2020-11-07 10:46:36 -08:00
George Hotz fbff6ab2e5 fix strided convs, GPU env var for enet 2020-11-07 10:26:37 -08:00
George Hotz ec03eb44bd tinygrad does forward pass convs on GPU 2020-11-07 10:15:56 -08:00
George Hotz bc7758cc5b getting convs to work on gpu 2020-11-07 09:17:57 -08:00
George Hotz 3302286e68 yayay test_sgd_gpu passes 2020-11-07 08:48:17 -08:00
George Hotz 38e112cccd logsoftmax test 2020-11-07 07:26:53 -08:00
Rene Delgado cd54697fd8
fix gpu sum forward (#61)
* ignore venv

* add sum test

* fix sum forward
2020-11-05 21:59:16 -08:00
NeuralLink cc605da36d
Stable Sigmoid op (#59)
* 🔨 Added stable sigmoid

*  added sigmoid test

* 🔧 suppressed overflow warning

* 🔧 clean up
2020-11-05 21:57:50 -08:00
George Hotz f178d23ff3 gpu relu is good 2020-11-02 08:25:32 -08:00
George Hotz 231c1134bd cute trick for GPU test 2020-11-02 08:17:17 -08:00
George Hotz 5201a8e89f matmul on GPU 2020-11-01 08:54:20 -08:00
George Hotz 41e7d59aed test dot 2020-11-01 07:51:35 -08:00
George Hotz 1f544d6ece test mnist on GPU 2020-11-01 07:46:17 -08:00
George Hotz 9ac1ad40d6
Add GPU Support! (do not merge yet) (#41)
* copy tensors to and from gpu

* add on GPU

* adding works

* we stick shapes in

* works on cpu and gpu

* test changes, not passing yet

* something else

* op tests pass

* add, mean, and sum have working forward/backward

* mul ops test

* no gpu support, no problem

* test pass, clean up later

* gpu cleanup

* cleanup test ops, don't let div fail

* revert more

* aimpler dispatcher

* clean up grad

* GPU and

* grad is a Tensor now

* gate test on GPU

* cleanups

* late loading gpu

* GPU as input option

* last cleanups
2020-11-01 07:00:49 -08:00
George Hotz 2c7e75d733
group conv: forward pass works (#34)
* forward pass works

* got the backward pass

* okay, it's now a coho
2020-10-30 09:19:20 -07:00
George Hotz 339a35b081 div needs help 2020-10-30 08:32:16 -07:00
George Hotz c14473f87d unit test for batchnorm2d 2020-10-30 08:19:58 -07:00
George Hotz 5e7e359706 fix tests 2020-10-29 08:19:07 -07:00
George Hotz 9ae3e9daf3 shape has to be a kwarg now, idk why this didn't break before 2020-10-29 08:13:05 -07:00
George Hotz f84f6c1edd write sqrt and div using pow 2020-10-29 07:57:25 -07:00
Göktuğ Karakaşlı 4b163ee270
efficient version of adam (#20)
* counteracted bias initialization

* test new adam

* add optimizer tests

* rename helper function names to fix the test

* remove redundant import
2020-10-27 15:54:40 -07:00
George Hotz f9788eba14 parameters, and start on efficientnet 2020-10-27 08:53:35 -07:00
George Hotz 1654008c1f conv stride support 2020-10-26 08:54:43 -07:00
George Hotz 2a55d7402b clean up ops, refactor pool backward. add stride test 2020-10-26 08:47:11 -07:00
George Hotz 93dceb4bee fix kernel_size bug, name like torch, add test 2020-10-26 08:38:53 -07:00
Timothy Mc Alister 15e5988323 make default parameters work for functions 2020-10-26 12:43:36 +01:00
George Hotz 2d37fd686b test ops 2020-10-25 19:03:49 -07:00
George Hotz 2eebbd32c6 ops test speed 2020-10-25 19:01:02 -07:00
George Hotz b27bcbe4b4 avgpool and test refactor 2020-10-25 18:40:01 -07:00
George Hotz 4c42676cb6 400 -> 200 2020-10-25 17:19:59 -07:00
George Hotz 567707a5f6 rename max_pool2d to match torch, remove more fast conv crap 2020-10-25 17:16:47 -07:00
George Hotz ea41f5e1c1 seems more generic 2020-10-25 16:40:37 -07:00
George Hotz 2333c4dea7 no tqdm in actions 2020-10-25 16:40:08 -07:00
George Hotz ad48061927 better sort in torch profiler 2020-10-25 16:07:49 -07:00
George Hotz 82f8e10813 no hacks in that test 2020-10-25 15:52:05 -07:00
George Hotz 4baa4c041f it's crazy how much faster pytorch is than numpy 2020-10-25 15:42:33 -07:00
George Hotz 5ddbd7f04b 2 to 3x slower than torch 2020-10-25 15:27:33 -07:00
George Hotz f8311f5ecd print fp/bp mnist 2020-10-25 15:08:18 -07:00
George Hotz 5c179d18ad add profiling for mnist net 2020-10-25 14:20:55 -07:00
George Hotz 8fcada8071 faster and better convnet 2020-10-25 13:48:44 -07:00
George Hotz 96f9cdb8a0 woah, fastconv is wrong 2020-10-25 12:56:42 -07:00
George Hotz bb98cdfef7 improve conv testing 2020-10-25 12:46:04 -07:00
George Hotz ef24aac09e finally, fast convs 2020-10-25 12:39:44 -07:00
George Hotz 67506eb6ba fast im2col 2020-10-25 11:49:35 -07:00
George Hotz c9968756d1 allow the line profiler to work 2020-10-25 11:13:40 -07:00
George Hotz 5062c2c8ff profile conv better 2020-10-25 11:11:00 -07:00
George Hotz c74764bac3 oops, set to None 2020-10-25 08:28:18 -07:00
George Hotz 935f5ddaaa always keep batch size out front 2020-10-25 08:14:07 -07:00
George Hotz b91fd3afad maxpool 2020-10-25 07:43:34 -07:00
George Hotz 5216a1d9f3 refactor into tensor and ops 2020-10-23 10:34:21 -07:00
George Hotz 9b9e47f369 added conv profile test 2020-10-23 09:46:10 -07:00
George Hotz 5756115e57 anyone else let down by the fast conv? 2020-10-23 09:09:29 -07:00
George Hotz bcb60e0b7c wow, you have to name them test 2020-10-23 06:33:18 -07:00
George Hotz 2259c9faa1 low lr improves rmsprop 2020-10-23 06:22:32 -07:00
George Hotz eda29fa0e0 clean up test 2020-10-23 06:11:38 -07:00
George Hotz 373b4e341b
Merge pull request #15 from f0ti/master
added RMSprop optim
2020-10-23 06:08:20 -07:00
f0ti 0b87aaca1e update rsmprop 2020-10-23 14:46:45 +02:00
f0ti c5f726ec2e all three 2020-10-23 11:53:01 +02:00
f0ti 6a38ccb6b0 update rmsprop and readme 2020-10-23 11:49:43 +02:00
George Hotz 21ebb0b769 if you wait 24 seconds, that gets 98% 2020-10-22 21:49:14 -07:00
George Hotz 816f648161 chans doesn't need to be in self 2020-10-22 21:19:35 -07:00
George Hotz 77251cc6c3 7x7 conv = more accuracy 2020-10-22 21:10:27 -07:00
f0ti 7e1eddb0c5 added RMSprop optim 2020-10-23 02:50:02 +02:00
0xNaN d95adbddb4 `gradcheck` now returns only a bool, refactoring of test_gradcheck 2020-10-22 01:28:52 +02:00
0xNaN adbfc67456 test `jacobian` and `numerical_jacobian` against torch.autograd.functional.jacobian 2020-10-22 01:28:52 +02:00
0xNaN 1561d3b9c0 extracting `jacobian` and `test_jacobian` 2020-10-22 01:28:52 +02:00
0xNaN 93bc3c22a0 tiny gradcheck 2020-10-22 01:28:52 +02:00
Adrian Garcia Badaracco 9a8be135a7
incorporate changes 2020-10-21 13:21:44 -05:00
Adrian Garcia Badaracco 02adb0ac3a
Make test_mnist runnable by pytest and directly 2020-10-21 11:30:08 -05:00
Adrian Garcia Badaracco 5afe6b1f68
rename files 2020-10-21 11:28:03 -05:00
George Hotz d91902948b add reshape support and OMG the CONVS are SO SLOW 2020-10-21 09:12:19 -07:00
George Hotz e3110c9922 backward pass for conv2d, lol i mostly guessed and made shapes match 2020-10-21 08:45:35 -07:00
George Hotz 5c2ac48c11 write forward pass for convolution 2020-10-19 09:33:06 -07:00
George Hotz 2681c79bc5 simple tests, repr not str 2020-10-18 14:55:20 -07:00
George Hotz 4019c38942 more readme 2020-10-18 14:38:20 -07:00
George Hotz cc9054e3ec refactor into utils 2020-10-18 14:36:29 -07:00
George Hotz 0c3dd12b3b i hate tabs 2020-10-18 14:33:13 -07:00
George Hotz a139f34bb6 fix nll loss in example 2020-10-18 14:27:54 -07:00
George Hotz 26ce2d93c3 add support for adam 2020-10-18 13:50:23 -07:00
George Hotz 6532233d24 refactor better 2020-10-18 13:33:02 -07:00
George Hotz 92fd23df66 refactor into a few files 2020-10-18 13:30:25 -07:00
George Hotz 118c2eebe3 write sgd class 2020-10-18 13:27:59 -07:00
George Hotz 54eafe6c12 update readme 2020-10-18 13:08:14 -07:00
George Hotz 83417d4b4c readme and dirs 2020-10-18 12:48:17 -07:00