tinygrad

You like pytorch? You like micrograd? You love tinygrad! ❤️

Go to file

George Hotz d5b3e18540 Accelerate with CL (#325 ) * accelerated opencl * it's running, it's just wrong * bugfix * model is correct in opencl * lazy image convert * add padding support to convolution * that stuff was all upstreamed * remove HEAD * oops * test_simple_conv2d_4 passes, add dilation support * put logic in ops_opencl * fix crash * hmm, stride seems okay * padding for batched inputs * just an issue now with cout%4 * op model still passes * fix startPackedInputChannel * pre and post processing ops for graph * don't break other llops * shapetrackering * reshapes are free * lazy movement ops		2022-06-16 15:40:52 -07:00
.github/workflows	Fix OpenCL installation (#301 )	2022-01-06 10:35:48 -05:00
accel	Accelerate with CL (#325 )	2022-06-16 15:40:52 -07:00
cache	add ff_dim to transformer	2021-11-29 12:40:52 -05:00
datasets	don't crash the dataloader for imagenet	2022-01-16 08:41:26 -08:00
docs	keepdim avoids reshapes	2022-06-05 15:56:42 -07:00
examples	Correct spelling mistakes (#314 )	2022-04-05 05:22:18 -07:00
extra	load new torch files	2022-06-07 10:06:48 -07:00
models	enet readability	2022-06-07 10:23:05 -07:00
test	Accelerate with CL (#325 )	2022-06-16 15:40:52 -07:00
tinygrad	Accelerate with CL (#325 )	2022-06-16 15:40:52 -07:00
.gitignore	use tinynn for Conv2d	2021-10-30 19:40:44 -07:00
LICENSE	readme	2020-10-18 11:27:37 -07:00
README.md	remove convt and compute dx with conv	2022-06-15 19:54:15 -07:00
push_pypi.sh	push pypi	2020-10-27 08:13:15 -07:00
requirements.txt	it's a real test now	2022-06-11 11:33:33 -07:00
setup.py	it's a real test now	2022-06-11 11:33:33 -07:00

README.md

For something in between a pytorch and a karpathy/micrograd

This may not be the best deep learning framework, but it is a deep learning framework.

The sub 1000 line core of it is in tinygrad/

Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. Support the simple basic ops, and you get SOTA vision models/efficientnet.py and language models/transformer.py models.

We are working on support for the Apple Neural Engine and the Google TPU in the accel/ folder. Eventually, we will build custom hardware for tinygrad, and it will be blindingly fast. Now, it is slow.

Installation

pip3 install git+https://github.com/geohot/tinygrad.git --upgrade

# or for development
git clone https://github.com/geohot/tinygrad.git
cd tinygrad
python3 setup.py develop

Example

from tinygrad.tensor import Tensor

x = Tensor.eye(3)
y = Tensor([[2.0,0,-2.0]])
z = y.matmul(x).sum()
z.backward()

print(x.grad)  # dz/dx
print(y.grad)  # dz/dy

Same example in torch

import torch

x = torch.eye(3, requires_grad=True)
y = torch.tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()

print(x.grad)  # dz/dx
print(y.grad)  # dz/dy

Neural networks?

It turns out, a decent autograd tensor library is 90% of what you need for neural networks. Add an optimizer (SGD, RMSprop, and Adam implemented) from tinygrad.optim, write some boilerplate minibatching code, and you have all you need.

Neural network example (from test/test_mnist.py)

from tinygrad.tensor import Tensor
import tinygrad.optim as optim

class TinyBobNet:
  def __init__(self):
    self.l1 = Tensor.uniform(784, 128)
    self.l2 = Tensor.uniform(128, 10)

  def forward(self, x):
    return x.dot(self.l1).relu().dot(self.l2).logsoftmax()

model = TinyBobNet()
optim = optim.SGD([model.l1, model.l2], lr=0.001)

# ... and complete like pytorch, with (x,y) data

out = model.forward(x)
loss = out.mul(y).mean()
optim.zero_grad()
loss.backward()
optim.step()

GPU and Accelerator Support

tinygrad supports GPUs through PyOpenCL.

from tinygrad.tensor import Tensor
(Tensor.ones(4,4).gpu() + Tensor.ones(4,4).gpu()).cpu()

ANE Support?! (broken)

If all you want to do is ReLU, you are in luck! You can do very fast ReLU (at least 30 MEGAReLUs/sec confirmed)

Requires your Python to be signed with ane/lib/sign_python.sh to add the com.apple.ane.iokit-user-access entitlement, which also requires amfi_get_out_of_my_way=0x1 in your boot-args. Build the library with ane/lib/build.sh

from tinygrad.tensor import Tensor

a = Tensor([-2,-1,0,1,2]).ane()
b = a.relu()
print(b.cpu())

Warning: do not rely on the ANE port. It segfaults sometimes. So if you were doing something important with tinygrad and wanted to use the ANE, you might have a bad time.

hlops (in tensor.py)

hlops are syntactic sugar around mlops. They support most things torch does.

mlops

mlops are mid level ops, there's 15 of them. They understand memory allocation and derivatives

Relu, Log, Exp                          # unary ops
Sum, Max                                # reduce ops (with axis argument)
Add, Sub, Mul, Pow                      # binary ops (no broadcasting, use expand)
Reshape, Permute, Slice, Expand, Flip   # movement ops
Conv2D(NCHW)                            # processing op (Matmul is also Conv2D)

You no longer need to write mlops for a new accelerator

Adding an accelerator (llops)

The autodiff stuff is all in mlops now so you can focus on the raw operations

Buffer                                               # class of memory on this device
unary_op  (RELU, EXP, LOG, NEG, SIGN)                # A -> A
reduce_op (SUM, MAX)                                 # A -> B (smaller size, B has 1 in shape)
binary_op (ADD, SUB, MUL, DIV, POW, CMPEQ)           # A + B -> C (all the same size)
movement_op (RESHAPE, PERMUTE, SLICE, EXPAND, FLIP)  # A -> B (different size)
processing_op (CONV)                                 # A + B -> C

When tinygrad moves to lazy evaluation, optimizations will happen here.

ImageNet inference

Despite being tiny, tinygrad supports the full EfficientNet. Pass in a picture to discover what it is.

ipython3 examples/efficientnet.py https://media.istockphoto.com/photos/hen-picture-id831791190

Or, if you have a webcam and cv2 installed

ipython3 examples/efficientnet.py webcam

PROTIP: Set "GPU=1" environment variable if you want this to go faster.

PROPROTIP: Set "DEBUG=1" environment variable if you want to see why it's slow.

tinygrad supports GANs

See examples/mnist_gan.py

tinygrad supports yolo

See examples/yolov3.py

The promise of small

tinygrad will always be below 1000 lines. If it isn't, we will revert commits until tinygrad becomes smaller.

Drawing Execution Graph

Nodes are Tensors
Black edge is a forward pass
Blue edge is a backward pass
Red edge is data the backward pass depends on
Purple edge is intermediates created in the forward

GRAPH=1 python3 test/test_mnist.py TestMNIST.test_sgd_onestep
dot -Tsvg /tmp/net.dot -o /tmp/net.svg && open /tmp/net.svg

Running tests

python3 -m pytest