Commit Graph

114 Commits

Author SHA1 Message Date
George Hotz 8564e28a1b
new memory scheduler with explicit refcounts (#4198)
* new memory scheduler with explict refcounts

* move central memory planner

* typo + use central memory planner in openpilot

* cleanups

* include lb_refcount in pickle

* replace PlaceHolder with memory planner

* cleaner
2024-04-17 08:46:47 +04:00
George Hotz e4a1858471
revert command queue (#4097) 2024-04-06 08:58:18 -07:00
chenyu e3c0ac9fbf
remove old envvar "OPT" (#4060) 2024-04-03 14:55:21 -04:00
George Hotz 7425a0c646
CommandQueue is the future (#3950)
* start of command queue

* cq work

* runs

* cleanup

* outs set

* read is gone

* future buffer work

* command queue is better

* command queue works

* loadops

* delete unneeded

* command queue works

* upd

* fix tests

* use CommandQueue in compile

* delay sync
2024-04-01 17:35:48 -07:00
chenyu c71627fee6
move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz 9eef44521b
ScheduleItem uses Buffer (#3995)
* schedule Buffer

* update

* update tests

* master

* works

* remove LoadOps.WAIT

* fix compile2

* bad test

* rename and note
2024-03-29 20:50:27 -07:00
George Hotz 68ca4d4276
split to schedule.py (#3949)
* split to schedule.py

* split
2024-03-26 21:02:46 -07:00
George Hotz 150ea2eb76
create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
qazal 337cd53444
multioutput ScheduleItem (#3699)
* refactor realize.py

* update docs

* update test_sched

* update runners and devices

* update openpilot and unit tests

* cleanup runner lowering

* update more tests
2024-03-13 08:59:38 -07:00
George Hotz 2e60012bcf
move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz 1f9aee8b6f
remove numpy from device (#3123)
* remove numpy from device

* fix tests

* np item

* cleanups

* simplify with as_buffer

* no toCPU

* tinygradic

* cast to scalar
2024-01-14 19:36:05 -08:00
George Hotz a280cfe169
move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz c81ce9643d
move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu 50927defad
s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
George Hotz 877c78b4ce
lazy tests (#2796)
* tests

* mini sd is very mini
2023-12-16 08:24:21 -08:00
George Hotz c6eb618013
tests from new lazy branch (#2774)
* tests from new lazy branch

* fix lin 11

* that was needed

* doesn't fail

* mark

* meant that

* llvm passes
2023-12-14 23:06:39 -08:00
George Hotz 27481b9206
Switch ops_gpu -> gpuctypes (#2532)
* ops_gpu is go

* fix size 0

* fix image, and add more tests

* nerf openpilot test, doesn't test thneed

* run the schedule

* better

* oops, new inputs

* delete pyopencl

* Update ops_gpu.py
2023-12-01 22:30:21 -08:00
George Hotz bfdce1f0e7 hotfix: make openpilot test deterministic 2023-12-01 15:37:23 -08:00
George Hotz 2c363b5f0b
new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
George Hotz 889acefe85
Support weird loads in Image (#2498)
* image support weird loads

* umm, that was always wrong

* openpilot compile fails with a weird error

* image test passes

* we have valids now

* clean that up

* no more required opts

* add fastvits test, fix bug

* minor cleanups
2023-11-29 08:30:46 -08:00
George Hotz d87a246439
move to new cached fetch (#2493)
* move to new cached fetch

* extra.utils is over

* loads

* bump download cache

* bump timeout
2023-11-28 17:36:55 -08:00
George Hotz 9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz 8e9cdef61f
clean up the buffers (#2447)
* clean up the buffers

* remove allocate_output

* functools.lru_cache is methodcache

* add TestShapeTrackerSize

* cache_clear

* no 0 sz buffer, add _ on functions that shouldn't be imported

* fix size

* if -> while
2023-11-26 11:02:29 -08:00
George Hotz 8656eebb42
jit doesn't use named tensors (#2393)
* jit doesn't use named tensors

* move to compile2

* remove broken single root junk

* explicit float32

* skip slow test
2023-11-23 00:13:18 -08:00
George Hotz b1f7f29525
metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu a753c8e071
examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
George Hotz 2f7aab3d13
move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz 5aaa8a0cc1 fix shape 2023-10-31 11:36:19 -07:00
George Hotz a27c9f9de5
openpilot compile2 (#2189)
* try compile2

* pass to thneed

* fix tanh onnx
2023-10-31 11:08:58 -07:00
George Hotz 881fd7c141
add mops to graph, refactor IMAGE (#2100)
* add mops to graph, refactor IMAGE

* no reshape pushing

* add todo

* fix openpilot model alt

* push reshapes reduces kernels in new op

* IMAGE=2 is a first class citizen now
2023-10-17 21:27:51 -07:00
George Hotz 5a4a62ecae
Disable logging in early compile2 and lower kernel counts (#2090)
* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* gate behind OPT >= 4

* disable_logging in schedule

* simple

* from master

* more images

* revert that

* 206 kernels
2023-10-16 20:15:24 -07:00
George Hotz a7b18ac325
try beam search on device (#2085)
* try beam search on device

* fix beam with nolocals

* ops too

---------

Co-authored-by: Comma Device <device@comma.ai>
2023-10-16 12:52:42 -07:00
George Hotz 5472a14544
openpilot compile2 (#1977)
* start compile2

* tweak

* why are there two more kernels?

* minor cleanups

* don't break onnx tests

* add __metadata__ support to safetensors

* no early realize in onnx

* cleanups

* bugfix

* clean up image type, add optimize

* opt to match old

* try that

* opt work

* run compile2

* optimizer

* prt more

* prerealize

* imp

* NOLOCALS works

* no locals means no locals

* support fractional globals

* all locals welcome

* int that

* cleanups

* show gemv regression

* clean up diff

* use idx for the cond

* nolocals

---------

Co-authored-by: Comma Device <device@comma.ai>
2023-10-15 20:39:46 -07:00
George Hotz 2d0c1037b1
Fix up latest openpilot model (#1976)
* fix gemv triggering for gemm

* fixup_openpilot

* external test issues
2023-10-05 05:24:28 -07:00
Umut Zengin 3987280daf
Fix VALIDHACKS for Images and make it default (#1832)
* valid hacks

* valid hacks

* valid hacks

* new method

* new method

* handtune

* is gate load breaking?

* lint

ruff

less junk

new approach?

maybe this?

* Make it more clear

* Make it more clear

* Will deal with the linter later

* hack for linter

* subs the idx but dont touch the valid

* Updated the mod rules

* lint hack

* I believe bug fix lets see

* Mod Node left

* revert

* Maybe this wont break?

* revert

* implemented "handtuned garbage"

* revert and use VALIDHACKS

* Lets see the CI

* still broken?

* currently its jungle

* maybe this jungle ?

* This works for everything somehow

* Added test for symbolic

* lint

* final touch

* This still works

* lint

* midway clean

* less garbage

* lint

* final form

* Slow but working way

* lint and other stuff

* lint

* mypy

* Make sure CI test Openpilot valid checks

* test if CI break

* Convert back

* refactor

* refactor

* Managed to reduce openpilot time from 30 secs to 5 secs

* Refactor

* Substitute a node with variable

* flake8

* Comment and refactor

* More comprehensive mod

* refactor

* bug fix

* More shave off

* remove not sure part
2023-09-23 07:34:43 +08:00
Karan Handa a8aa13dc91
[ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
nimlgen 1c0449e190
add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
chenyu b5d700adae
update openpilot supercombo.onnx to 0.9.4 (#1681)
* update openpilot supercombo.onnx to 0.9.4

* update tests for the new model

* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
chenyu ae39cf84ab
Symbolic Shape JIT main PR (#1353)
* Symbolic Shape JIT

update tests

2 variables symbolic ops, adding more tests

test passing

cleanup

* more test cases

* single flag

* review update

* jit attention one piece

* realize

* symbolic_jit test for cuda

* old artifact

* works with cuda gpu but failed ci

* CUDACPU
2023-08-18 14:39:55 -07:00
George Hotz 18892242b0
global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
George Hotz 7fbf96b992 jit: TODO, use abstractions 2023-05-05 22:51:30 -07:00
George Hotz 7ecf4dff68
multi cl_queue (#762)
* multi cl_queue

* only platforms 1

* gpus first, then cpus

* put device on underlying buffer

* cl_queue array
2023-05-03 12:15:28 -07:00
George Hotz b12b60af20
fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz f5467cfedc
Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz d8dda2af3a openpilot fixups 2023-03-06 14:14:44 -08:00
George Hotz 382f346523
clean up opt (#649)
* clean up opt

* don't let global kernels get too small

* 8192 -> 1024

* disable local shape for clang

* fix can_merge

* unroll the 5x5 depthwise convs in op

* load float4 check
2023-03-05 20:49:36 -08:00
George Hotz c53efb3635
optimize for CL (#633)
* required opt

* simplify

* works

* shift_to_last

* required is fine

* print shape in colored

* better shape

* args was wrong

* debugs

* fix empty shape

* colored shape printer
2023-03-03 22:00:09 -08:00
George Hotz 1a84976d4d fix thneed gflops 2023-03-03 16:52:59 -08:00
George Hotz b9ce20c374 openpilot test wasn't running, factor out image idx 2023-03-03 07:41:53 -08:00
George Hotz 2e26286294
speed like you wouldn't believe (#626)
* speed like you wouldn't believe

* fix tests
2023-03-02 07:49:19 -08:00