Commit Graph

14 Commits

Author SHA1 Message Date
chenyu c71627fee6
move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz 9a6ac2a50a
create the buffer with the LazyBuffer (#3977)
* create the buffer with the LazyBuffer

* fixes

* hack underlying buffer when we change dtype

* we only care about allocated buffers

* asserts
2024-03-28 19:31:28 -07:00
George Hotz c81ce9643d
move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
George Hotz 1765849937
new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
chenyu 73cadfbb3c
Remove pytest markers (#2831)
* remove pytest marker

* fix some, skip some

* tweak

* fix

* skip slow

* skip more
2023-12-18 18:53:28 -05:00
George Hotz 2f7aab3d13
move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz f5467cfedc
Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz bfcec234a2
Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz 643e8b0388 fix tests, test bn evaluate too 2023-02-27 10:39:47 -08:00
George Hotz fed95119dc CL.mem_used -> GlobalCounters.mem_used 2023-02-10 23:13:29 -06:00
George Hotz 3d63934995
refactor to keep cl in the runtime (#545)
* refactor to keep cl in the runtime

* fix thneed, rename cl to _cl

* bugfix + _cuda

* fix tests

* thneed more correct
2023-02-08 16:46:09 -06:00
George Hotz fff1f046b0
Simple version of the new GPU backend (#458)
* newgpu

* more to delete

* hmm, tests pass with constant folding

* fix lint/type

* fix constant folding

* comment and rerun tests

* lazy touchups

* fix graph_batchnorm test

* smaller transformer to fix OOM

* Revert "smaller transformer to fix OOM"

This reverts commit a44ef8edc275a4b3c78ee711ba188e220b7a879f.

* no func cache

* introspect

* touchups

* CLASTKernel

* ugh, it was lru_cache

* codegen

* spacing

* old gpu still in opencl

* typing fix
2023-01-10 19:16:02 -08:00
George Hotz 6a8fb53304
move ops.py into lazy.py (#402)
* move ops.py into lazy.py

* fix graph and linter

* ugh, didn't add
2022-10-25 13:58:03 -07:00
George Hotz 0516359af8 fix stupid OPENCL=1 OOM 2022-09-06 14:29:23 -07:00