George Hotz
ff64bcab69
move graph/search to engine ( #4596 )
2024-05-14 23:12:59 -07:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
George Hotz
17faae091b
optimizer shouldn't be run without training ( #4460 )
...
* optimizer shouldn't be run without training
* set training in relevant tests
* fix multitensor
* that too
2024-05-06 15:34:12 -07:00
George Hotz
f4e49a7c1a
resnet 50 opt: correct loop + LARS ( #4449 )
...
* correct loop + LARS
* ops
2024-05-06 08:01:26 -07:00
George Hotz
fc995d4446
add backward to handcode_resnet50_opt
2024-05-06 06:42:26 -07:00
chenyu
aa093efa43
fix handcode_resnet50_opt flops count ( #4184 )
2024-04-15 22:13:45 -04:00
chenyu
b47f6cebb2
LinearizerOptions -> CompilerOptions ( #3978 )
2024-03-28 17:50:23 -04:00
George Hotz
68ca4d4276
split to schedule.py ( #3949 )
...
* split to schedule.py
* split
2024-03-26 21:02:46 -07:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
Anurag Lamsal
4e0819e40b
fixing the benchmark not printing in handcode resnet50 opt example ( #3850 )
2024-03-21 00:55:31 -04:00
qazal
337cd53444
multioutput ScheduleItem ( #3699 )
...
* refactor realize.py
* update docs
* update test_sched
* update runners and devices
* update openpilot and unit tests
* cleanup runner lowering
* update more tests
2024-03-13 08:59:38 -07:00
qazal
aec4c4f01b
linearizer ast as a tuple of lazyops ( #3689 )
...
* multi store op linearizer
* currently we do only one output per kernel
* named opts
2024-03-11 15:39:04 -07:00
George Hotz
2e60012bcf
move create schedule and delete old API ( #3377 )
...
* move create schedule and delete old API
* fix test multitensor
2024-02-12 18:10:45 +01:00
chenyu
77251336d5
fix handcode_resnet50_opt.py ( #3289 )
...
linearizer_opts has moved. also update the logging to print after total_tm update
2024-01-31 19:01:08 -05:00
chenyu
53afec2841
add HALF to handcode_resnet50_opt.py ( #3202 )
...
use this to study tensor cores on HIP
2024-01-21 23:03:59 -05:00
chenyu
58d3d5030b
vars_from_ast -> LazyOp.vars ( #2965 )
2024-01-01 18:12:38 -05:00
George Hotz
00d9eda961
FROM -> COPY, move vars_from_ast ( #2675 )
2023-12-07 16:32:30 -08:00
chenyu
05a5357dd9
fix handcode_resnet50_opt.py ( #2558 )
2023-12-01 20:51:21 -05:00
Akshay Kashyap
a031afb2f6
Update display_name in resnet50 example ( #2454 )
2023-11-26 16:07:36 -08:00
George Hotz
0cbf6c1811
move things, clean up extra ( #2292 )
...
* move things
* idk why pylint needs that now
* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
7103b716c4
merge kernel and optimizer ( #2200 )
...
* merge kernel and optimizer
* linearize is reentrant
* move global/local size
* clean up linearizer copy
* remove unneeded lin copies
* stop linearizing twice
* oops, that should be None
2023-11-01 15:20:01 -07:00
George Hotz
e0201922e3
Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE ( #2142 )
...
* stable diffusion < 324ms
* revert swap action
* fix tests due to more sum splitting
* REDUCEOP_SPLIT_THRESHOLD env var
* added from unaligned np test (#2134 )
* align cpu buffer before copy into cl buffer (#2135 )
* remove shelve from handcode_resnet50_opt.py (#2139 )
* Add dictionary keys to reduce db size (#2131 )
* work
* ignore beam cache
* dictionary keys are generic
* minor db cleanups
* fix baseline and extract dataset
* fix training
* log likelihood
* more lin to feats
* sts
* training policynet
* net sort of works
* dedup
* refactor, stupid new actions
* fix uops deduping
* BEAM_ESTIMATE
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>
2023-10-27 10:53:06 -10:00
chenyu
d5e2fdea22
remove shelve from handcode_resnet50_opt.py ( #2139 )
2023-10-24 10:37:30 -04:00
George Hotz
c36d306606
KOPT is over, BEAM is upstream ( #2071 )
...
* create cache for q learning
* make linter happy
* global beam
* where it belongs
* bugfix
* ditch the kopt, use the beam
* faster lin and DEBUG=2 okay
* remove kopt, move search to features
2023-10-16 09:46:03 -07:00
George Hotz
49bcfec383
0s in the action space ( #2070 )
...
* 0s in the action space
* simpler
* skip duplicate actions
2023-10-14 11:22:48 -07:00
George Hotz
6f1810af2d
with unroll, the action space goes from 161 -> 127 ( #2060 )
...
* with unroll, the action space goes from 161 -> 127
* more reliable instrumentation
* beam search is so op
* beam bugfix
2023-10-12 20:52:23 -07:00
George Hotz
c5edb3c374
train value net, improve API, add BCE ( #2047 )
...
* api cleanups, BCE losses
* valuenet
* fixup examples
* learning okay
* add valuenet runner
* net improvements
* net improvements
* 40% win rate
2023-10-12 07:56:38 -07:00
George Hotz
41bfeb2c1e
start work on auto opt ( #2034 )
...
* start work on auto opt
* lin failure
* not beating hcopt
* greedy
* timing is fast
* codegen.search
* greedy search in handcode_opt
* track running gflops
* clean up those files
* no failure
2023-10-11 12:54:53 -07:00
George Hotz
44ed94ef5c
use the device abstraction in handcode_resnet50_opt
2023-10-07 13:22:20 -07:00
George Hotz
121f7aa8c5
Schedule item ( #2012 )
...
* ScheduleItem
* put var_vals in the schedule
* fix tests, wow that proliferated quickly
* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
George Hotz
f54959e5cd
move print tree into graph ( #2003 )
...
* move print tree into graph
* add winograd profiling test
* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
nimlgen
2ea1dd3e87
no process() in Linearizer ( #1966 )
...
* no process() in Linearizer
* more process() clean up
2023-10-04 07:18:42 -07:00
George Hotz
90326dbdc3
resnet50 hand coded optimization ( #1945 )
...
* resnet50 hand coded opt
* hand optimize one kernel
* opt in both places to fix test
2023-09-29 09:34:51 -07:00