Commit Graph

25 Commits

Author SHA1 Message Date
chenyu 7fec966b5e
bye bye NOOP (#2534)
* bye bye NOOP

* SIN

* NEG
2023-11-30 23:10:35 -08:00
George Hotz 5629fc368c
Use Buffer.STORE at the end of ASTs (#2494)
* work

* store broken

* interpreteds work

* this passes

* symbolic cpu

* fix tests

* fix opt tests

* images fail

* fix InterpretedFlopCounter

* stupid hack for images
2023-11-28 20:11:37 -08:00
George Hotz ab5d14d4ba
MEM -> LOAD (#2492)
* MEM -> LOAD

* keep legacy working
2023-11-28 16:46:37 -08:00
chenyu 8798d120bb
autopad shapetracker for BEAM (#2375)
* autopad shapetracker for BEAM

* OptOps.PADTO

* skip that test for now

* correct padding reduce axis

* just 32

* avoid more than double the FLOPs

* cleanups

* test case

* no support for triton and llvm yet

* typos

* symbolic shape would not work

* cannot PADTO with MAX kernel

* advance db version

* no breaking change - don't advance db version

* is triton just python?

* Revert "is triton just python?"

This reverts commit 17e776c25587615e33a3634c2fb0bb8591ce65d4.

* Revert "Revert "is triton just python?""

This reverts commit 6c434c01e1c4b0ea0431ec18632cd859fb3cf260.

* support llvm

* is it really passing in CI only?

* update tests

* oh triton test passed

* simpler

* revert that, with a test

* check if st are the same

* Revert "check if st are the same"

This reverts commit d2a5eac110a5da1af82a2728c883779ef69c3cad.

* update the db version

* rebase artifact
2023-11-22 21:05:25 -05:00
chenyu 822d6e6f18
Simpler mops verify (#2325)
* rewrite the to_movement_ops check using symbolic

* tweak
2023-11-15 21:47:18 -05:00
nimlgen 4e0d47533e
beam works with var vals (#2296)
* beam works with var vals

* test passes now

* better comment

* linter happy
2023-11-14 13:03:19 -05:00
George Hotz 7103b716c4
merge kernel and optimizer (#2200)
* merge kernel and optimizer

* linearize is reentrant

* move global/local size

* clean up linearizer copy

* remove unneeded lin copies

* stop linearizing twice

* oops, that should be None
2023-11-01 15:20:01 -07:00
forcefieldsovereign f294bdd681
fixed imports (#2185) 2023-10-30 22:07:17 -07:00
chenyu 6c58bf3e9c
in time_linearizer, allocate a scratch buffer if output buffer is also input (#2152)
* in time_linearizer, allocate a scratch buffer if output buffer is also input

* move scratch buffer creation outside search
2023-10-28 07:17:41 -10:00
George Hotz e0201922e3
Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE (#2142)
* stable diffusion < 324ms

* revert swap action

* fix tests due to more sum splitting

* REDUCEOP_SPLIT_THRESHOLD env var

* added from unaligned np test (#2134)

* align cpu buffer before copy into cl buffer (#2135)

* remove shelve from handcode_resnet50_opt.py (#2139)

* Add dictionary keys to reduce db size (#2131)

* work

* ignore beam cache

* dictionary keys are generic

* minor db cleanups

* fix baseline and extract dataset

* fix training

* log likelihood

* more lin to feats

* sts

* training policynet

* net sort of works

* dedup

* refactor, stupid new actions

* fix uops deduping

* BEAM_ESTIMATE

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>
2023-10-27 10:53:06 -10:00
chenyu 0ca0e9ee5e
exclude ast with variables from beam search (#2140)
* exclude ast with variables from beam search

* test that

* add to CI
2023-10-25 16:35:29 -04:00
George Hotz cea2bc7964
Add dictionary keys to reduce db size (#2131)
* work

* ignore beam cache

* dictionary keys are generic

* minor db cleanups

* fix baseline and extract dataset

* fix training

* log likelihood
2023-10-24 10:49:22 -04:00
George Hotz abeba8f1fc
optimization: get actions in CI (#2125)
* get actions in CI

* actually run the test

* pythonpath
2023-10-20 12:22:01 -07:00
George Hotz c36d306606
KOPT is over, BEAM is upstream (#2071)
* create cache for q learning

* make linter happy

* global beam

* where it belongs

* bugfix

* ditch the kopt, use the beam

* faster lin and DEBUG=2 okay

* remove kopt, move search to features
2023-10-16 09:46:03 -07:00
George Hotz 49bcfec383
0s in the action space (#2070)
* 0s in the action space

* simpler

* skip duplicate actions
2023-10-14 11:22:48 -07:00
George Hotz 4124cf1df5
cleanup tensor cores, expose exclude local upcast (#2064)
* expose exclude_local_upcast

* convert apply tensor cores to ops

* update comment

* put LOCAL back to what it was, BEAM is better than way
2023-10-14 09:21:03 -07:00
George Hotz 90c777d815
remove apply_auto_opt (#2063) 2023-10-13 07:44:14 -07:00
George Hotz 6f1810af2d
with unroll, the action space goes from 161 -> 127 (#2060)
* with unroll, the action space goes from 161 -> 127

* more reliable instrumentation

* beam search is so op

* beam bugfix
2023-10-12 20:52:23 -07:00
George Hotz c5edb3c374
train value net, improve API, add BCE (#2047)
* api cleanups, BCE losses

* valuenet

* fixup examples

* learning okay

* add valuenet runner

* net improvements

* net improvements

* 40% win rate
2023-10-12 07:56:38 -07:00
George Hotz 0ba629c7b9
add world dataset (#2045) 2023-10-11 15:54:30 -07:00
George Hotz 0c3b6f13a8
Latest opt (#2044)
* split out actions

* rl algorithm
2023-10-11 15:46:14 -07:00
George Hotz 41bfeb2c1e
start work on auto opt (#2034)
* start work on auto opt

* lin failure

* not beating hcopt

* greedy

* timing is fast

* codegen.search

* greedy search in handcode_opt

* track running gflops

* clean up those files

* no failure
2023-10-11 12:54:53 -07:00
chenyu 1c980517c5
s/var_vals_from_ast/vars_from_ast (#2038) 2023-10-10 20:21:55 -07:00
George Hotz f139060103
Rewrite hand coded opt with action space (#2030)
* tests passing

* hand coded opt with new abstractions

* simpler opts

* split out tensor cores
2023-10-10 07:38:38 -07:00
George Hotz 16ca8410f8
op logger + replay (#2021)
* logops

* fix dtype printing

* needs inf

* ops dataset

* minor improvements

* 12k kernels

* opt can compile

* graph flops
2023-10-08 15:10:18 -07:00