* conv2d is an hlop
* shorter conv
* KOPT=-1
* alt imp
* MULACC
* smarter mulacc
* pop conv
* 7x7 -> 5x5
* didn't fix, that's not going to work
* this is faster and matches old behavior
* oh, non lazy just won't work with mulacc
* mulacc in torch
* bool types were creeping in
* optimizer is actually better with hlop conv
* fix pushing permutes issue
* refactor einsum_mulacc
* fix up readme
* update readme
* _image_conv2d
* fix bias addition location
* pushing permutes gets back to 200 kernels
* conv cleanup
* disable hlop conv
* don't hide that in helpers
This commit resolves issue https://github.com/geohot/tinygrad/issues/453
In the example code in the README.md, when it is run, it prints for Tiny
Grad the tensors as:
<Tensor <LB (3, 3) op:MovementOps.RESHAPE> with grad None>
<Tensor <LB (1, 3) op:MovementOps.RESHAPE> with grad None>
But to be equivalent to the output of the Torch example, we need
to use numpy() to get it to show:
[[ 2. 2. 2.]
[ 0. 0. 0.]
[-2. -2. -2.]]
[[1. 1. 1.]]
* make lazy the default
* always float32
* while the lazy framework should be default, lazyness itself shouldn't be (for now)
* bugfixes
* remove the need for the ops class
* fxn_for_op
* hmm, my contiguous asserts went away
* move small shape thing
* refactor reduce
* remove the weird unused new functions
* only that install works
* thats broken
* unused imports, should be good if it passes
* replace broadcasting with expand
* Tensor, not self
* remove broadcasting from mlops
* delete useless A operator
* expand, not repeat
* remove A op
* expand on gpu
* binary_op doesn't broadcast anymore
* expand is still total junk, but the tests should pass
* Update all devices to be tested
ANE, CPU and OCL all now support all tests.
However tests are not currently passing on GPU and I cannot test on CPU.
Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.
OpenCL Tests have not been run since commit: 1a1c63a08b
devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)
All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.
Refactor of the conversion code to allow for any device to any device
conversion.
* Add six dependency in requirements.txt
* Resolve failure to run tests
Move six into gpu required installs. Remove six from standard
installation.
* Remove repeated data conversion
* Refactor method names
Also reduce code with .to and .to_
* Dynamic device handlers
* Refactor DeviceTypes -> Device
* Add mem copy profiling back
* test_backward_pass_diamond_model passing
* Resolve Sum issue on GPU
* Revert batchnorm2d tests
* Update README with upadated API
* ANE testing with
* Last minute line gains