Commit Graph

2711 Commits

Author SHA1 Message Date
George Hotz 90c777d815
remove apply_auto_opt (#2063) 2023-10-13 07:44:14 -07:00
nimlgen bd42fa0b73
kernel cache (#2035)
* init compiled cache

* clang not compile to stdout

* use kwrags in compile

* remove some useless lines

* slimmer

* fix

* tabs

* retry

* remove decorator

* no race in hip

* smaller hip

* unused import

* unused pathlib

* path to str

* add test

* fix linter

* less lines?

* decorator is back

* update tests

* no hip version

* better comments

* a bit better test

* linter

* work wo decorator

* linter happy

* simpler return type

* more tests

* better comment

* readable

* readable

* readable

* compile returns bytes

* no ununsed imports

* readable
2023-10-13 06:32:01 -07:00
George Hotz 6f1810af2d
with unroll, the action space goes from 161 -> 127 (#2060)
* with unroll, the action space goes from 161 -> 127

* more reliable instrumentation

* beam search is so op

* beam bugfix
2023-10-12 20:52:23 -07:00
Umut Zengin 6b7ac5c431
ModNode __mod__ rule (#2039)
* Implement mod rule

* mypy

* feat: New test added
2023-10-12 11:30:10 -07:00
Yixiang Gao 3187962476
CIFAR HALF mode (#2041)
* load weights in fp16

* add dtype option in nn

* fix test

* no need for dtype in nn

* add option to load weights in FP16, but NaN

* change loss scaler

* cast to float32 for norm layer

* add a todo for the forward pass padding

* fix transform
2023-10-12 10:19:51 -07:00
George Hotz c5edb3c374
train value net, improve API, add BCE (#2047)
* api cleanups, BCE losses

* valuenet

* fixup examples

* learning okay

* add valuenet runner

* net improvements

* net improvements

* 40% win rate
2023-10-12 07:56:38 -07:00
George Hotz 0ba629c7b9
add world dataset (#2045) 2023-10-11 15:54:30 -07:00
George Hotz 0c3b6f13a8
Latest opt (#2044)
* split out actions

* rl algorithm
2023-10-11 15:46:14 -07:00
geohotstan 8d6cecb25c
Torch eq fix (#1562)
* init

* Revert "init"

This reverts commit 682bf2073a8b4eca111596c67cf6ebd79f59e585.

* kids dont do drugs

* one way to fix

* resolve merge conflict

* no more or

* clean up
2023-10-11 12:57:11 -07:00
George Hotz 41bfeb2c1e
start work on auto opt (#2034)
* start work on auto opt

* lin failure

* not beating hcopt

* greedy

* timing is fast

* codegen.search

* greedy search in handcode_opt

* track running gflops

* clean up those files

* no failure
2023-10-11 12:54:53 -07:00
chenyu 1c980517c5
s/var_vals_from_ast/vars_from_ast (#2038) 2023-10-10 20:21:55 -07:00
Francis Lam 81c7d750db
test: fix test_linearizer.test_tensor_core test (#2036)
must use apply_tensor_core instead of hand_coded_optimizations
2023-10-10 14:48:28 -07:00
chenyu e2b83f1b42
Variable.bind newer (#2017)
* Variable.bind attempt 2

* ShapeTracker.unbind

* fix llama

* fix types

* test case

* View.vars cleanup

* include mask in symbolic source

* mask can be sint

* st.unbind in bufferops

* assert ast contain free Variable only

* cleanup

* conservative unbinding reduce op arg

* move reduceop unbind

* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
qazal 71d93ffd79
Refactor GPU and Metal langauges in their own separate renderers (#2033)
* Refactor GPU and Metal langauges in their own separate renderers

* remove CStyleLanguage imports

* move renderers too
2023-10-10 07:46:41 -07:00
George Hotz f139060103
Rewrite hand coded opt with action space (#2030)
* tests passing

* hand coded opt with new abstractions

* simpler opts

* split out tensor cores
2023-10-10 07:38:38 -07:00
Ahmed Harmouche e27fedfc7b
Fix stable diffusion output error on WebGPU (#2032)
* Fix stable diffusion on WebGPU

* Remove hack, numpy cast only on webgpu

* No-copy numpy cast
2023-10-10 06:40:51 -07:00
qazal e40f141203
Refactor and add more unit tests for disktensors (#2022)
* testing with the test_ops pattern

* add assign test

* flake8 complaining about single line fn

* slice 2d and minor cleanup

* make assign_slice a one-liner

* we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn

* back assign fn for np array

* implement __setitem__ in tensor.py

* dont re-slice the ret tesnsor

* one liner assign

* drop the permute test
2023-10-09 18:46:29 -07:00
chenyu 45f0891a8f
use "<" instead of "<=" in codegen for loop (#2027) 2023-10-09 17:26:36 -07:00
chenyu 25555c836f
llama default to JIT only if device supports JIT (#2028) 2023-10-09 17:26:02 -07:00
George Hotz 16ca8410f8
op logger + replay (#2021)
* logops

* fix dtype printing

* needs inf

* ops dataset

* minor improvements

* 12k kernels

* opt can compile

* graph flops
2023-10-08 15:10:18 -07:00
calledit 46f354b49f
Fix comment to describe code (#2023) 2023-10-08 14:28:14 -07:00
qazal 0e2e041faf
CI for using tinygrad as an external pkg (#2019)
* create workflow

* unify with test.yml
2023-10-08 10:50:48 -07:00
George Hotz 8db92bd060 fix tvm gemm example 2023-10-08 05:57:41 -07:00
mmmkkaaayy af6e2f31ca
whisper: cast model output token to int32 (#2013)
Co-authored-by: mmmkkaaayy <mmmkkaaayy@users.noreply.github.com>
2023-10-08 05:56:22 -07:00
Luca Sciarpa e93e240a6c
adapting test/external/external_osx_profiling.py to the new code base (#2002)
* adapting external osx profiling

* fixing dtype

* fixing buffer size
2023-10-08 05:55:00 -07:00
wozeparrot c4e8ea73bd
feat: add tinygrad.features to setup.py (#2016) 2023-10-07 21:55:50 -07:00
Francis Lam dece9958f8
wmma: clean up to make WMMA arg order consistent (#2014)
also add cache defeat to extra/gemm/simple_matmul.py
2023-10-07 17:45:40 -07:00
George Hotz cea4cbfc7a
move image+kopt to features (#2015)
* move image+kopt to features

* fix tests

* debug prints (unrelated)
2023-10-07 15:41:08 -07:00
George Hotz 44ed94ef5c use the device abstraction in handcode_resnet50_opt 2023-10-07 13:22:20 -07:00
George Hotz 6ee9cae44f don't extract CIFAR every time / use the cache 2023-10-07 12:33:50 -07:00
nimlgen d07ac379f9
add var_vals to kopt with symbolic (#2008)
* add var_vals to kopt with symbolic again

* no copies
2023-10-07 09:34:21 -07:00
George Hotz 121f7aa8c5
Schedule item (#2012)
* ScheduleItem

* put var_vals in the schedule

* fix tests, wow that proliferated quickly

* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
George Hotz f1f64bc88d
remove val_vars from the linearizer (#2009)
* remove val_vars from the linearizer

* no need to store var vals
2023-10-07 07:47:28 -07:00
George Hotz dea8bb0938
triton isn't tested, and allows this refactor (#2007)
* triton isn't tested

* cuda buffer
2023-10-07 07:29:59 -07:00
George Hotz 23de1db727 strip whitespace 2023-10-07 06:06:27 -07:00
Roelof van Dijk 26fcc8dff6
fix: remove runtime imports (#1982)
fix: import what is used

probably monkeypatched

fix: import

revert selective import
2023-10-07 05:23:08 -07:00
George Hotz f54959e5cd
move print tree into graph (#2003)
* move print tree into graph

* add winograd profiling test

* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
Ahmed Harmouche 2114dc13d1
Allow multi-input model export (#1995)
* Allow multi-input model export

* Add model export unit test

* Fix efficientnet compilation

* Only run model export test on JIT supported devices

* Skip export model test if not EXPORT_SUPPORTED_DEVICE
2023-10-07 04:13:34 -07:00
George Hotz ffa33d743a
good changes from openpilot_compile2 (#2000)
* good changed from openpilot_compile2

* float32 image type was wrong

* cleaner way to write that + a test
2023-10-06 13:33:24 -07:00
chenyu 05be57f57f
Fix llama with empty prompt (#1997)
* fix llama with one token prompt

* llama is all_jitted
2023-10-06 06:48:07 -07:00
George Hotz 7a68060422
Revert "allow local + grouped reduce in hand_coded (#1996)" (#1998)
This reverts commit 219a1f7063.
2023-10-06 06:43:28 -07:00
nimlgen 219a1f7063
allow local + grouped reduce in hand_coded (#1996)
* allow local + grouped reduce in hand_coded

* allowed loop size based on global_dims

* fix const

* fix const one more time

* better divisor

* a bit fix

* can take 2, why not

* fix linter

* better comments

* start with 2

* not always pick group reduce

* fix images

* better images

* better
2023-10-06 06:11:28 -07:00
George Hotz fa9945dac0 remove stale tests 2023-10-06 02:14:56 -07:00
Vidhan Bhatt 94b21c41a7
ci: use `mypy.ini` (#1993) 2023-10-06 01:45:28 -07:00
George Hotz e43d8977f8
Revert "chore: add `py.typed` marker. (#1991)" (#1994)
This reverts commit 6d581e8911.
2023-10-06 01:44:34 -07:00
Vidhan Bhatt 6d581e8911
chore: add `py.typed` marker. (#1991)
* chore: add `py.typed` marker.

* fix: add comma
2023-10-05 16:27:33 -07:00
chenyu da2b3e55f4
simpler llama - don't shrink twice (#1981) 2023-10-05 14:31:46 -07:00
Roelof van Dijk 972d9ea215
fix: PRUNEGRAPH is unused (#1985) 2023-10-05 14:28:43 -07:00
George Hotz 21a2c5df73
fix up contiguous (#1978) 2023-10-05 07:22:05 -07:00
chenyu c99fa58dd2
simplify gpt2 example (#1973)
* simplify gpt2 example

* kernel_jitted_count and jit tests

* Revert "kernel_jitted_count and jit tests"

This reverts commit 31a3c26dd061dbcf6c43c295a265813ccb35b9e9.

* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00