Commit Graph

1270 Commits

Author SHA1 Message Date
George Hotz 6fe9edf30f torch cuda is very fast 2023-01-23 16:24:46 -08:00
George Hotz a949de873b
reduce 2.0 (#469)
* reduce 2.0

* works

* hacks

* DEBUG=3 for shapes

* fix types

* 0s weren't being folded

* cleaner

* last_reduce is no longer needed

* comments and cleanup
2023-01-23 15:11:13 -08:00
George Hotz a6de94b444 test partial sum 2023-01-22 21:28:40 -08:00
George Hotz f1196984e6 harmless to intertwine the math and the stores 2023-01-21 09:31:56 -08:00
George Hotz 708215d06b
Typing (#468)
* we typing

* types look good in theory

* most tests pass

* gpu tests pass

* TEST_AST

* delete comments

* i must have written that bug so many times

* bugfix

* don't merge the small ones

* add f to constants

* commits from reduce

* don't GCD the mod nodes

* broken and a hack IMAGE=3

* group for reduce

* fix linter + mypy

* move out test ast

* insource TENSOR_TYPE_TO_NP_TYPE

* does this fix it?

* move imports out
2023-01-21 09:09:22 -08:00
George Hotz b29614592a first conv/second conv 2023-01-19 13:26:11 -08:00
George Hotz 3d697577b2 print_ast 2023-01-19 13:22:03 -08:00
George Hotz 325a440cb5 pass in op_estimate in opencl 2023-01-19 11:02:23 -08:00
George Hotz 844c645834 add flops for processing op 2023-01-19 10:58:44 -08:00
George Hotz 0881d504c1
move shapetracker (#466)
* move shapetracker

* shapetracker test

* move ast

* move a few things

* fix print kernel

* fix test

* symbolic fixups
2023-01-19 09:56:31 -08:00
George Hotz 2b47ee401f
Symbolic for indexes (#464)
* indexer

* works

* all use indexer

* boolean in the indexer too

* symbolic is a better name than indexer

* better symbolic API

* min and max

* symbolic tests

* work

* more tests

* fix demodder

* __str__ in the superclass

* NumNode

* awesome that works

* still works

* fix up parens

* fix zeroviews

* dead lines

* expr_node

* works

* still works

* refactor to not use __new__ methods

* ugh something went wrong a while ago

* this fixes it

* mod and div at the end

* test

* symbolic

* working

* one linter issue fixed

* other division

* more simplifys

* works

* validhacks

* VALIDHACKS passes thneed

* no str replace stuff

* inline indexes

* NATIVE_EXPLOG and factoring

* factor both ways

* cl indexing

* split on mod, not just full

* onnxlimit

* fix output shape

* op_estimate is a function of the program

* no ones in the index

* four_float4

* ALLOW_4FLOAT4

* test passes

* compute then store

* loads first

* bugfix

* better, but doesn't match

* select xb in smart way

* new test and bugfix

* no change to lazy

* Node fixes linter

* fix opencl with op_estimate

* fix mypy

* revert valid

* remove unused
2023-01-19 07:21:30 -08:00
George Hotz 3a3400e3a2 more from indexer 2023-01-18 18:11:51 -08:00
George Hotz 9245f4650a indexer changes for master 2023-01-18 18:02:02 -08:00
George Hotz 15d04f13ce expr_idxs 2023-01-15 10:41:59 -08:00
George Hotz 70b771a175 idx idy 2023-01-15 09:39:22 -08:00
George Hotz 7ea89779fa add returns between views 2023-01-15 08:58:10 -08:00
George Hotz 287699c32c simplify ones after axis splitting 2023-01-14 10:51:43 -08:00
George Hotz 1b5def5b9d flip image x/y to match OPENCL 2023-01-12 17:45:37 -08:00
George Hotz 49c6e6d472
Latest attempt to add image (#462)
* add image

* load + store + boring stuff:

* image tests pass

* thneed print GFLOPS

* op conv test

* more debugging

* hack for multiview image

* shapetracker creates less views

* disable image tests

* working better

* ugh, lkey not key

* print in DEBUG, and allow views

* works

* simple padding conv2d

* use index for image

* that was bad code

* debug print

* fix types

* less lines

* save lines
2023-01-12 17:36:30 -08:00
George Hotz 698143aebc shapetracker creates less views 2023-01-12 14:04:04 -08:00
George Hotz 281b0db773 three from image 2023-01-12 12:26:58 -08:00
George Hotz 7824ba256e no fake contiguous 2023-01-11 21:27:53 -08:00
George Hotz 795285ce43 actually fix early alloc 2023-01-11 21:12:50 -08:00
George Hotz 0724fd61f1 fix early alloc 2023-01-11 21:07:27 -08:00
George Hotz f1378b3ea1 fix linter and force allocation on hostbuf 2023-01-11 21:04:21 -08:00
George Hotz 3ea38cac72 IMAGE == 1, add reshape to the ast 2023-01-11 20:56:03 -08:00
George Hotz 9ff6c532eb
Prereqs for IMAGE=1 (#461)
* contig

* move ast, debug prog

* add Token

* cleanup reduce

* exec_ast
2023-01-11 20:18:42 -08:00
George Hotz 74fb772e2a rawcpu is junk, replace with llvm 2023-01-10 19:18:45 -08:00
George Hotz fff1f046b0
Simple version of the new GPU backend (#458)
* newgpu

* more to delete

* hmm, tests pass with constant folding

* fix lint/type

* fix constant folding

* comment and rerun tests

* lazy touchups

* fix graph_batchnorm test

* smaller transformer to fix OOM

* Revert "smaller transformer to fix OOM"

This reverts commit a44ef8edc275a4b3c78ee711ba188e220b7a879f.

* no func cache

* introspect

* touchups

* CLASTKernel

* ugh, it was lru_cache

* codegen

* spacing

* old gpu still in opencl

* typing fix
2023-01-10 19:16:02 -08:00
George Hotz 66123c99b9 gpubuffer repr match llvmbuffer 2023-01-09 20:02:22 -08:00
George Hotz 0a7d2b1a2e fix kernel_cnt type 2023-01-09 19:34:57 -08:00
George Hotz 4356683081 gpu: rename kernels 2023-01-09 19:32:22 -08:00
George Hotz 1e1abb450e fromcpu 2023-01-09 19:18:57 -08:00
George Hotz 90121482fa oops, don't assign self 2023-01-09 18:02:12 -08:00
George Hotz fad7cba590 move batchnorm to Tensor 2023-01-09 18:00:16 -08:00
George Hotz 27211103ae docker: no -it 2023-01-09 12:49:59 -08:00
George Hotz d6e86a29a8 docker: forgot to checkout code 2023-01-09 12:48:03 -08:00
George Hotz 73ce9a771e that fix it 2023-01-09 12:46:33 -08:00
George Hotz bfd4f4e35c testdocker 2023-01-09 12:41:52 -08:00
George Hotz 4885fce56e
shapetracker from newgpu (#456)
* shapetracker from newgpu

* touchup ops

* test

* testst

* thneed deletes unused inputs

* test

* bugfix
2023-01-09 12:40:01 -08:00
Faisal Memon 538b1d7f5b
Print out the tensor using numpy(). (#454)
This commit resolves issue https://github.com/geohot/tinygrad/issues/453

In the example code in the README.md, when it is run, it prints for Tiny
Grad the tensors as:
<Tensor <LB (3, 3) op:MovementOps.RESHAPE> with grad None>
<Tensor <LB (1, 3) op:MovementOps.RESHAPE> with grad None>

But to be equivalent to the output of the Torch example, we need
to use numpy() to get it to show:
[[ 2.  2.  2.]
 [ 0.  0.  0.]
 [-2. -2. -2.]]
[[1. 1. 1.]]
2023-01-09 10:08:05 -08:00
nogira 2e744ef2f2
confirmed (#449)
w/ a bunch of print statements in the official model here: ce05de2819/ldm/modules/diffusionmodules/openaimodel.py (L413)
2023-01-07 08:41:06 -08:00
Nicolai Stoianov 8dbf76268d
Add step for setting up Stable Diffusion (#452) 2023-01-07 08:40:12 -08:00
cloud11665 4fb97b8de0
don't fail when termcolor is not installed (#436) 2022-11-14 16:45:06 -08:00
George Hotz 5e07d4669d
the speedy chonker is going to replace the old chonker (#432)
* bringing back reshape and permute

* done with E701

* 4x4 works in generic way

* max and sum not vectorizing...

* special case single float

* support comparing to MPS

* improve matmul speed, consider generic principles

* GlobalCounter

* fix op tracking

* faster

* comment that out for now

* err, it needs that

* fix minor issues

* fix global_mem
2022-11-11 18:34:24 -08:00
George Hotz d2273d2cc4 s/contiguous_op/contiguous 2022-11-11 00:07:05 -08:00
George Hotz b8c94a67c9
Simple chonker (#431)
* chonker will make llvm fast

* work

* better speed tests, we will make them fast

* with the cache add is the same speed

* relu and neg are fast

* fix sum speed

* maximum maxnum?

* hack for gemm opt

* gemm very slow

* zeros like

* test_permute

* shapetracker returns self

* fix shapetracker factorization

* err, int strides

* permutes are faster now in tinygrad than pytorch

* support -1 in expand

* gemm unrolled

* improve final test case

* WIP GEMM

* why isn't GEMM fast?

* revert cache dim

* ffp contract works on clang, not llvm?

* ignore llvm ir

* this makes fma work at least, but no faster

* USE_4x4

* 63 GFLOPS

* 87 GFLOPS

* that wasn't matmul, 44 GFLOPS now

* 82 GFLOPS permuted

* this permute too

* a little speed for the convs

* 45 GFLOPS

* speed tests pass again

* clean up prints

* fix FMA WHAT A WASTE OF TIME

* colors

* moar fair

* GPU

* useless on chonker

* cleanups

* improve factorized shapetracker

* better threshold

* label conv

* work

* ops test pass again

* hot load the index

* run the last view, no need to create

* ZeroView needs a repr for the key to work

* fix segfault on out of bounds

* one more test

* start amx, and llvm.initialize_native_asmparser

* amx works

* nice AMX class

* nicer AMX class

* refactor get_idxs

* amx working

* is slower...

* useless flip

* cache

* SZ_X

* AMX_SZ_X/Y work alone

* Contiguous mlop

* test gemm packed

* PREPARE in packed

* use_amx factor

* prefetch isn't faster

* loop

* same 3ms

* 2.24 ms

* allow double on store in TG

* amx reduce is the same speed as non amx reduce

* include memory bandwidth

* clean up shapetracker

* flip returns stride

* prepare for upstream

* Update ops_llvm.py (#426)

* permutes are yellow and green now

* faster conv

* llvm cleanups

* Show optimised IR under debug 4 (#428)

* ASTKernel class

* Make tinygrad work with older python version (#427)

* Make tinygrad work with older python version

* Use partialmethod instead of partial

* smiple chonker is chonking

* remove junk from test speed vs torch

* fix linker and types

* AMX is only here now

* add LLVM tests, it's a valid backend now

* oops, run llvm test

* contiguous_op

* fix loadops compare

* dedup reduceops

Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>
2022-11-10 23:17:09 -08:00
George Hotz bff47e9dc1 contiguous, and no strided for matmul 2022-11-09 16:56:26 -08:00
George Hotz 1271f19a2b factorizing shapetracker from chonker 2022-11-09 16:36:38 -08:00
Daniel Davis 64ff1ddc10
Reduce line count (#424)
* save a line, save a life

* save a line, save a life

* change order of tern
2022-11-09 10:07:22 -08:00