mirror of https://github.com/commaai/tinygrad.git
69 lines
1.5 KiB
Plaintext
69 lines
1.5 KiB
Plaintext
Getting the core instruction set correct is the value of tinygrad
|
|
|
|
Max size tensor is 6-D for the pool2d
|
|
|
|
Unary Ops
|
|
===
|
|
|
|
These are the simplest to reason about, and have pointwise mem access.
|
|
A and B are always the same size
|
|
|
|
Forward : A -> B
|
|
Backward (binary): (B', A) -> A'
|
|
|
|
|
|
|
|
Reduce Ops (with axis)
|
|
===
|
|
|
|
These take in an axis argument. B is smaller than A
|
|
Max and Sum are pretty different, do we really need Max?
|
|
|
|
Forward : A -> B
|
|
Backward : B' -> A'
|
|
|
|
|
|
|
|
Binary Ops (with broadcasting)
|
|
===
|
|
|
|
Pointwise mem access also.
|
|
Broadcasting adds complexity, aliased input.
|
|
Unbroadcasting for grad is a sum, but should be combined with the ternary op.
|
|
|
|
Forward : (A, B) -> C
|
|
Backward (ternary): (C', A, B) -> (A', B')
|
|
|
|
C.shape = max(A.shape, B.shape)
|
|
|
|
|
|
|
|
Movement Ops (2 or 1)
|
|
===
|
|
|
|
Reshape, Transpose, Slice
|
|
|
|
Depending on your Tensor implementation, these are free.
|
|
Reshape is almost always free.
|
|
Slice can be made free, but probably shouldn't be.
|
|
Transpose is hard to make free except in trivial cases.
|
|
|
|
Regardless, these are "reindexings" of existing arrays
|
|
Transpose and Slice are similar enough I think they can be merged.
|
|
They should use a DMA engine
|
|
|
|
|
|
|
|
Processing Ops (4)
|
|
===
|
|
|
|
Matmul is 1 matmul for forward, 2 for backward.
|
|
* It's actually three matmuls transposed
|
|
* cublasSgemm()
|
|
|
|
Conv2D is very complex. It seems to need three.
|
|
* cudnnConvolutionForward()
|
|
* cudnnConvolutionBackwardData()
|
|
* cudnnConvolutionBackwardFilter()
|
|
NOTE: Tensor Cores require that the tensors be in the NHWC data layout
|