* ane query is success
* cite and build instructions
* low level access, need to disable AMFI
* coreml_ane works
* coreml fun
* more work
* compiled example
* progress
* compiler works
* model flow
* TODOs in the readme
* put some real weights in
* we are learning objc
* much progress i think
* signed model still doesn't work
* working example
* there are float16
* clean up: part 1
* h11ane header, more cleanup
* cleanup DeviceController creation
* remove the stupid sleep
* notes
* start a hwx parser
* no tabs
* compare stuff
* hmm, why don't inputs work
* cache doesn't seem to fix it
* hmm, the issue was the compiler
* fix the compiler, guess i didn't put in weights
* logging for compiler
* uselessness in plist
* remove hwx before compile, weights are converted to float16
* better compare
* better compare
* last line in comparE
* opcodes from compiler
* notes
* Detach
* Torch.detach reuses the buffer in the
* Fix test
* wakey wakey GitHub Actions
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
* tensor implementation for rmsprop and adam
* test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu
* number of steps reduced for adam from 1000 to 200
* streamlined numerical_jacobian
* Got rid of the g loop in Conv2D.forward
* ereased stupid line
* nothing
* no loops in Conv2D forward
* Conv2D backprop improved
* stupid things in examples
* alternative to einsum
* Conv2D backward einsum alternative
* tidying up
* tidied up
* no ravel
* got rid of print
* Update efficientnet.py
* Update efficientnet.py
* Update efficientnet.py
* only tensordot
* 255.0
* whitespace
* aspect ratio error in efficientnet
* noprint
* efficient net wrong strides
* broadcasting for backward ops
* Update ops.py
* Update ops.py
- was wrong
* broadcast test for backward enabled
* function adBC + not summing over already 1 axis
* spacing
Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array
* remove extra tabs
Co-authored-by: phillip <phillip_bement@reedbement.com>
* Pad2d backward pass on GPU
* Faster Pad2D GPU backward pass (no zeroing needed)
* Fix out of bounds error
* Don't save prg
* Let compiler optimize division by 1
* More generic broadcasting (1s at the start)
* Bug fix
* Add comment
* Try to fix flaky test with other method
* Add mixed broadcast support
* 1kernel
* Separate broadcast tests
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
* Somewhat more generic broadcasting
* Add TODO
* Set Torch to deterministic in test
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>