George Hotz
90c777d815
remove apply_auto_opt ( #2063 )
2023-10-13 07:44:14 -07:00
nimlgen
bd42fa0b73
kernel cache ( #2035 )
...
* init compiled cache
* clang not compile to stdout
* use kwrags in compile
* remove some useless lines
* slimmer
* fix
* tabs
* retry
* remove decorator
* no race in hip
* smaller hip
* unused import
* unused pathlib
* path to str
* add test
* fix linter
* less lines?
* decorator is back
* update tests
* no hip version
* better comments
* a bit better test
* linter
* work wo decorator
* linter happy
* simpler return type
* more tests
* better comment
* readable
* readable
* readable
* compile returns bytes
* no ununsed imports
* readable
2023-10-13 06:32:01 -07:00
George Hotz
6f1810af2d
with unroll, the action space goes from 161 -> 127 ( #2060 )
...
* with unroll, the action space goes from 161 -> 127
* more reliable instrumentation
* beam search is so op
* beam bugfix
2023-10-12 20:52:23 -07:00
Umut Zengin
6b7ac5c431
ModNode __mod__ rule ( #2039 )
...
* Implement mod rule
* mypy
* feat: New test added
2023-10-12 11:30:10 -07:00
Yixiang Gao
3187962476
CIFAR HALF mode ( #2041 )
...
* load weights in fp16
* add dtype option in nn
* fix test
* no need for dtype in nn
* add option to load weights in FP16, but NaN
* change loss scaler
* cast to float32 for norm layer
* add a todo for the forward pass padding
* fix transform
2023-10-12 10:19:51 -07:00
George Hotz
c5edb3c374
train value net, improve API, add BCE ( #2047 )
...
* api cleanups, BCE losses
* valuenet
* fixup examples
* learning okay
* add valuenet runner
* net improvements
* net improvements
* 40% win rate
2023-10-12 07:56:38 -07:00
George Hotz
0ba629c7b9
add world dataset ( #2045 )
2023-10-11 15:54:30 -07:00
George Hotz
0c3b6f13a8
Latest opt ( #2044 )
...
* split out actions
* rl algorithm
2023-10-11 15:46:14 -07:00
geohotstan
8d6cecb25c
Torch eq fix ( #1562 )
...
* init
* Revert "init"
This reverts commit 682bf2073a8b4eca111596c67cf6ebd79f59e585.
* kids dont do drugs
* one way to fix
* resolve merge conflict
* no more or
* clean up
2023-10-11 12:57:11 -07:00
George Hotz
41bfeb2c1e
start work on auto opt ( #2034 )
...
* start work on auto opt
* lin failure
* not beating hcopt
* greedy
* timing is fast
* codegen.search
* greedy search in handcode_opt
* track running gflops
* clean up those files
* no failure
2023-10-11 12:54:53 -07:00
chenyu
1c980517c5
s/var_vals_from_ast/vars_from_ast ( #2038 )
2023-10-10 20:21:55 -07:00
Francis Lam
81c7d750db
test: fix test_linearizer.test_tensor_core test ( #2036 )
...
must use apply_tensor_core instead of hand_coded_optimizations
2023-10-10 14:48:28 -07:00
chenyu
e2b83f1b42
Variable.bind newer ( #2017 )
...
* Variable.bind attempt 2
* ShapeTracker.unbind
* fix llama
* fix types
* test case
* View.vars cleanup
* include mask in symbolic source
* mask can be sint
* st.unbind in bufferops
* assert ast contain free Variable only
* cleanup
* conservative unbinding reduce op arg
* move reduceop unbind
* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
qazal
71d93ffd79
Refactor GPU and Metal langauges in their own separate renderers ( #2033 )
...
* Refactor GPU and Metal langauges in their own separate renderers
* remove CStyleLanguage imports
* move renderers too
2023-10-10 07:46:41 -07:00
George Hotz
f139060103
Rewrite hand coded opt with action space ( #2030 )
...
* tests passing
* hand coded opt with new abstractions
* simpler opts
* split out tensor cores
2023-10-10 07:38:38 -07:00
Ahmed Harmouche
e27fedfc7b
Fix stable diffusion output error on WebGPU ( #2032 )
...
* Fix stable diffusion on WebGPU
* Remove hack, numpy cast only on webgpu
* No-copy numpy cast
2023-10-10 06:40:51 -07:00
qazal
e40f141203
Refactor and add more unit tests for disktensors ( #2022 )
...
* testing with the test_ops pattern
* add assign test
* flake8 complaining about single line fn
* slice 2d and minor cleanup
* make assign_slice a one-liner
* we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn
* back assign fn for np array
* implement __setitem__ in tensor.py
* dont re-slice the ret tesnsor
* one liner assign
* drop the permute test
2023-10-09 18:46:29 -07:00
chenyu
45f0891a8f
use "<" instead of "<=" in codegen for loop ( #2027 )
2023-10-09 17:26:36 -07:00
chenyu
25555c836f
llama default to JIT only if device supports JIT ( #2028 )
2023-10-09 17:26:02 -07:00
George Hotz
16ca8410f8
op logger + replay ( #2021 )
...
* logops
* fix dtype printing
* needs inf
* ops dataset
* minor improvements
* 12k kernels
* opt can compile
* graph flops
2023-10-08 15:10:18 -07:00
calledit
46f354b49f
Fix comment to describe code ( #2023 )
2023-10-08 14:28:14 -07:00
qazal
0e2e041faf
CI for using tinygrad as an external pkg ( #2019 )
...
* create workflow
* unify with test.yml
2023-10-08 10:50:48 -07:00
George Hotz
8db92bd060
fix tvm gemm example
2023-10-08 05:57:41 -07:00
mmmkkaaayy
af6e2f31ca
whisper: cast model output token to int32 ( #2013 )
...
Co-authored-by: mmmkkaaayy <mmmkkaaayy@users.noreply.github.com>
2023-10-08 05:56:22 -07:00
Luca Sciarpa
e93e240a6c
adapting test/external/external_osx_profiling.py to the new code base ( #2002 )
...
* adapting external osx profiling
* fixing dtype
* fixing buffer size
2023-10-08 05:55:00 -07:00
wozeparrot
c4e8ea73bd
feat: add tinygrad.features to setup.py ( #2016 )
2023-10-07 21:55:50 -07:00
Francis Lam
dece9958f8
wmma: clean up to make WMMA arg order consistent ( #2014 )
...
also add cache defeat to extra/gemm/simple_matmul.py
2023-10-07 17:45:40 -07:00
George Hotz
cea4cbfc7a
move image+kopt to features ( #2015 )
...
* move image+kopt to features
* fix tests
* debug prints (unrelated)
2023-10-07 15:41:08 -07:00
George Hotz
44ed94ef5c
use the device abstraction in handcode_resnet50_opt
2023-10-07 13:22:20 -07:00
George Hotz
6ee9cae44f
don't extract CIFAR every time / use the cache
2023-10-07 12:33:50 -07:00
nimlgen
d07ac379f9
add var_vals to kopt with symbolic ( #2008 )
...
* add var_vals to kopt with symbolic again
* no copies
2023-10-07 09:34:21 -07:00
George Hotz
121f7aa8c5
Schedule item ( #2012 )
...
* ScheduleItem
* put var_vals in the schedule
* fix tests, wow that proliferated quickly
* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
George Hotz
f1f64bc88d
remove val_vars from the linearizer ( #2009 )
...
* remove val_vars from the linearizer
* no need to store var vals
2023-10-07 07:47:28 -07:00
George Hotz
dea8bb0938
triton isn't tested, and allows this refactor ( #2007 )
...
* triton isn't tested
* cuda buffer
2023-10-07 07:29:59 -07:00
George Hotz
23de1db727
strip whitespace
2023-10-07 06:06:27 -07:00
Roelof van Dijk
26fcc8dff6
fix: remove runtime imports ( #1982 )
...
fix: import what is used
probably monkeypatched
fix: import
revert selective import
2023-10-07 05:23:08 -07:00
George Hotz
f54959e5cd
move print tree into graph ( #2003 )
...
* move print tree into graph
* add winograd profiling test
* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
Ahmed Harmouche
2114dc13d1
Allow multi-input model export ( #1995 )
...
* Allow multi-input model export
* Add model export unit test
* Fix efficientnet compilation
* Only run model export test on JIT supported devices
* Skip export model test if not EXPORT_SUPPORTED_DEVICE
2023-10-07 04:13:34 -07:00
George Hotz
ffa33d743a
good changes from openpilot_compile2 ( #2000 )
...
* good changed from openpilot_compile2
* float32 image type was wrong
* cleaner way to write that + a test
2023-10-06 13:33:24 -07:00
chenyu
05be57f57f
Fix llama with empty prompt ( #1997 )
...
* fix llama with one token prompt
* llama is all_jitted
2023-10-06 06:48:07 -07:00
George Hotz
7a68060422
Revert "allow local + grouped reduce in hand_coded ( #1996 )" ( #1998 )
...
This reverts commit 219a1f7063
.
2023-10-06 06:43:28 -07:00
nimlgen
219a1f7063
allow local + grouped reduce in hand_coded ( #1996 )
...
* allow local + grouped reduce in hand_coded
* allowed loop size based on global_dims
* fix const
* fix const one more time
* better divisor
* a bit fix
* can take 2, why not
* fix linter
* better comments
* start with 2
* not always pick group reduce
* fix images
* better images
* better
2023-10-06 06:11:28 -07:00
George Hotz
fa9945dac0
remove stale tests
2023-10-06 02:14:56 -07:00
Vidhan Bhatt
94b21c41a7
ci: use `mypy.ini` ( #1993 )
2023-10-06 01:45:28 -07:00
George Hotz
e43d8977f8
Revert "chore: add `py.typed` marker. ( #1991 )" ( #1994 )
...
This reverts commit 6d581e8911
.
2023-10-06 01:44:34 -07:00
Vidhan Bhatt
6d581e8911
chore: add `py.typed` marker. ( #1991 )
...
* chore: add `py.typed` marker.
* fix: add comma
2023-10-05 16:27:33 -07:00
chenyu
da2b3e55f4
simpler llama - don't shrink twice ( #1981 )
2023-10-05 14:31:46 -07:00
Roelof van Dijk
972d9ea215
fix: PRUNEGRAPH is unused ( #1985 )
2023-10-05 14:28:43 -07:00
George Hotz
21a2c5df73
fix up contiguous ( #1978 )
2023-10-05 07:22:05 -07:00
chenyu
c99fa58dd2
simplify gpt2 example ( #1973 )
...
* simplify gpt2 example
* kernel_jitted_count and jit tests
* Revert "kernel_jitted_count and jit tests"
This reverts commit 31a3c26dd061dbcf6c43c295a265813ccb35b9e9.
* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00