Commit Graph

484 Commits

Author SHA1 Message Date
George Hotz c71a8ef222 remove unused Div op 2020-12-06 13:02:12 -08:00
George Hotz 20f95de408 less lines 2020-12-06 12:56:03 -08:00
George Hotz 629efb391f lose a few lines 2020-12-06 12:42:41 -08:00
George Hotz 521098cc2f se optional, track time better 2020-12-06 12:29:42 -08:00
George Hotz 609d11e699 trainer works with CIFAR 2020-12-06 12:20:14 -08:00
George Hotz 80a9c777ba requires grad, optim in train enet 2020-12-06 11:10:30 -08:00
George Hotz c66c27d22e get parameters 2020-12-06 10:45:04 -08:00
George Hotz 51daaa43d4 fix memory leaks, add gc test 2020-12-06 10:34:40 -08:00
George Hotz 1717daa859 reshape doesn't copy anymore 2020-12-06 09:51:09 -08:00
George Hotz 62ee47fef8 add GPUBuffer class 2020-12-06 09:45:13 -08:00
George Hotz 17659f7dd7 gpu speedup, tests work on M1 2020-12-06 09:05:49 -08:00
George Hotz b8deb36e56 train BS=16 for 32 steps 2020-12-04 10:00:32 -08:00
George Hotz ad1b225722 oops, i broke fill 2020-12-04 09:53:38 -08:00
George Hotz cb79c9838e make the GPU 25% faster by not recreating kernels 2020-12-04 09:51:00 -08:00
George Hotz df81bf5985 it's the default, but make it explicit 2020-12-04 09:43:41 -08:00
George Hotz 888689b57b proprotip 2020-12-04 09:24:46 -08:00
George Hotz 2862b42bac install from github 2020-12-04 09:06:25 -08:00
George Hotz 1290e01e2c all ops supported on GPU now 2020-12-03 10:43:11 -08:00
George Hotz 621a93b777 ane in readme 2020-12-03 10:40:31 -08:00
George Hotz 1dcaecacc4
Support for Apple Neural Engine (#130)
* ane query is success

* cite and build instructions

* low level access, need to disable AMFI

* coreml_ane works

* coreml fun

* more work

* compiled example

* progress

* compiler works

* model flow

* TODOs in the readme

* put some real weights in

* we are learning objc

* much progress i think

* signed model still doesn't work

* working example

* there are float16

* clean up: part 1

* h11ane header, more cleanup

* cleanup DeviceController creation

* remove the stupid sleep

* notes

* start a hwx parser

* no tabs

* compare stuff

* hmm, why don't inputs work

* cache doesn't seem to fix it

* hmm, the issue was the compiler

* fix the compiler, guess i didn't put in weights

* logging for compiler

* uselessness in plist

* remove hwx before compile, weights are converted to float16

* better compare

* better compare

* last line in comparE

* opcodes from compiler

* notes
2020-12-03 10:32:26 -08:00
baplou c83cebccda
Made the readme more consistent (#136) 2020-11-28 08:20:02 -06:00
Marcel Bischoff 541330c42a
Update README.md (#133)
should we put `ipython3` otherwise the path doesn't work or we have to add the env, not sure what is nicer
2020-11-25 07:53:54 -08:00
Mufeed VH 0bbf66627c
Define `ProfileOp` class once (#131)
* define `ProfileOp` class once

* clean `ProfileOp` class

* removed `else: pass`
2020-11-24 19:39:13 -08:00
George Hotz 03994e0011 load torch files without torch 2020-11-21 13:43:53 -08:00
Marcel Bischoff 26899869a2
Update tensor.py (#128)
Otherwise `.cpu()` is broken if default is GPU
2020-11-21 09:16:03 -08:00
adamritter f190ca446d
Detach (#123)
* Detach

* Torch.detach reuses the buffer in the

* Fix test

* wakey wakey GitHub Actions

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-19 19:03:42 -08:00
Colin Manko 8383ff40ad
fix pyopencl (#125) 2020-11-19 19:03:04 -08:00
adamritter 5797e63d9b
Train efficientnet should respect NUM environment variable (#122)
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-16 20:02:31 -08:00
dustcollector12 ee99d016e9
tensor implementation for rmsprop and adam (#121)
* tensor implementation for rmsprop and adam

* test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu

* number of steps reduced for adam from 1000 to 200
2020-11-16 15:07:49 -08:00
George Hotz 17bf90dbe4 unbroadcasting works on the GPU 2020-11-16 09:16:55 -08:00
George Hotz 17eab716b6 unbroadcast GPU template 2020-11-16 08:16:36 -08:00
George Hotz 2ffb8de1ea move efficientnet to extra 2020-11-16 08:08:07 -08:00
George Hotz 13d34373d1 move gradcheck to extra, clean up unbroadcast 2020-11-16 08:03:31 -08:00
George Hotz ed4c35e2e9 channels on the inside 2020-11-15 21:19:59 -08:00
adamritter fb1df81c7d
Fix train_efficientnet (#120)
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-15 20:50:31 -08:00
George Hotz 1207fe4c7d cleanup LogSoftmax 2020-11-15 20:49:57 -08:00
George Hotz d1441de3a6 minor cleanups 2020-11-15 20:39:19 -08:00
George Hotz 37a210f868 touchups and lines 2020-11-15 20:26:52 -08:00
adamritter 5ea3d76dfb
Topological sort, zero_grads (#119)
* Topological sort, zero_grads

* Bug fix, add test

* Add zero_grads

* Put deepwalk function in backward

* Move zero_grad to optim

* Fix gradcheck hack

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-15 20:25:29 -08:00
George Hotz a35425189d binop fast path for no broadcast 2020-11-15 19:12:14 -08:00
Marcel Bischoff c7b7f8ccc8
Backwards ops supporting broadcasting (#118)
* streamlined numerical_jacobian

* Got rid of the g loop in Conv2D.forward

* ereased stupid line

* nothing

* no loops in Conv2D forward

* Conv2D backprop improved

* stupid things in examples

* alternative to einsum

* Conv2D backward einsum alternative

* tidying up

* tidied up

* no ravel

* got rid of print

* Update efficientnet.py

* Update efficientnet.py

* Update efficientnet.py

* only tensordot

* 255.0

* whitespace

* aspect ratio error in efficientnet

* noprint

* efficient net wrong strides

* broadcasting for backward ops

* Update ops.py

* Update ops.py

- was wrong

* broadcast test for backward enabled

* function adBC + not summing over already 1 axis

* spacing

Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>
2020-11-15 15:21:10 -08:00
adamritter 55d93017e4
Simplify more (#117)
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-14 06:15:31 -08:00
dustcollector12 28474949b8
refactoring of forward in reshape (#115)
* refactoring of forward in reshape

* test case for reshape added
2020-11-13 13:20:43 -08:00
dustcollector12 6f033ea30a
enable local images for efficientnet.py (#116) 2020-11-13 07:00:12 -08:00
pb1729 420af82888
General broadcasting of binary operations (#114)
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array

* remove extra tabs

Co-authored-by: phillip <phillip_bement@reedbement.com>
2020-11-12 22:27:48 -08:00
damianzim 2b1286eef6
Don't wrap np.int32 in a function, use an alias (#113) 2020-11-12 19:32:19 -08:00
adamritter 08aa60d9d0
broadcasting 1s at the start, 1 kernel/4 divs version (#110)
* Pad2d backward pass on GPU

* Faster Pad2D GPU backward pass (no zeroing needed)

* Fix out of bounds error

* Don't save prg

* Let compiler optimize division by 1

* More generic broadcasting (1s at the start)

* Bug fix

* Add comment

* Try to fix flaky test with other method

* Add mixed broadcast support

* 1kernel

* Separate broadcast tests

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-12 13:33:35 -08:00
NeuralLink f773ef3996
tanh non first class op (#111)
*  tanh non first class op

* tanh test with 1e-6 tol

Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>
2020-11-12 13:32:50 -08:00
Ryan Neph 608bdd4872
adds broadcasting test cases (#106)
refs: #80, #90, #104, #105
2020-11-12 07:08:28 -08:00
adamritter f1d21afe88
Somewhat more generic broadcasting (#105)
* Somewhat more generic broadcasting

* Add TODO

* Set Torch to deterministic in test

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-11 20:33:00 -08:00