George Hotz
cd534dee11
cstyle changes that don't pass process replay ( #6734 )
...
* cstyle changes that don't pass process replay
* add constant folder back there
* cleanups
* const
* fix some tests
* bfloat16 too
* complete set of types
* that cast shouldn't be needed
* that was a questionable test
2024-09-25 17:33:34 +08:00
George Hotz
232edcfd4f
cast bool for type verify [run_process_replay] ( #6742 )
2024-09-25 17:12:16 +08:00
George Hotz
cb22ef379a
truncate consts early ( #6741 )
...
* truncate consts early
* ptx still fails
* Update dtype.py
2024-09-25 16:49:51 +08:00
nimlgen
e31552e2e0
qcom reinit queue on exec ( #6728 )
...
* qcom setup on exec as gpu=1
* linter
* gpulike
* offsets
2024-09-25 16:08:50 +08:00
George Hotz
882339f729
remove parens from neg ( #6738 )
2024-09-25 15:38:20 +08:00
qazal
5ad2f95d01
process replay diff stats ( #6736 )
...
* process replay diff stats
* fix tuples
2024-09-25 15:19:56 +08:00
nimlgen
56979aa3ed
qcom ioctl log levels ( #6735 )
2024-09-25 14:59:27 +08:00
chenyu
66af8bb54c
use UOp.replace and UOp.define_var in validhack ( #6730 )
...
easier to see the diff in replacement
[run_process_replay]
2024-09-25 02:51:34 -04:00
chenyu
ff25bfb1b0
conv backward tests in test_simplify_valid_idx ( #6727 )
...
the backward idx is pretty ugly now
2024-09-25 02:51:07 -04:00
qazal
6c69fec1ef
viz more info for rewrite location ( #6729 )
2024-09-25 14:49:40 +08:00
George Hotz
39f78619ff
cstyle replay [run_process_replay] ( #6731 )
...
* real minimum cstyle change
* make it match
* bring back DEFINE_GLOBAL store marking writable
* bump line count to 9800
* closer
* precompute don't render
* cast/bitcast too
* smem_align
* vectorize
* more pr match
* remove that test
* less PR diff
* cstyle changes that [run_process_replay]
2024-09-25 14:26:05 +08:00
nimlgen
e1caa24a92
qcom fix binded queue might be overwritten ( #6712 )
2024-09-25 12:45:23 +08:00
George Hotz
dd575da7ee
real minimum cstyle change ( #6709 )
...
* real minimum cstyle change
* make it match
* bring back DEFINE_GLOBAL store marking writable
* bump line count to 9800
* closer
* precompute don't render
* cast/bitcast too
* smem_align
* vectorize
* more pr match
* remove that test
* less PR diff
2024-09-25 12:40:46 +08:00
chenyu
e6a1b5aa8f
more test_simplify_valid_idx cleanup ( #6726 )
...
moved UOps.VECTORIZE of idx into the helper
2024-09-24 23:47:42 -04:00
chenyu
14524eeddc
test_image_valid.py -> test_simplify_valid_idx.py ( #6724 )
...
restructure the tests, will use the same file for non-image tests
2024-09-24 23:32:27 -04:00
qazal
e0d8685c99
test_masked_upcast_wino check device buf_max ( #6723 )
2024-09-25 11:26:53 +08:00
George Hotz
f45d178a55
hotfix: support JIT_BATCH_SIZE=0, make that the default
2024-09-25 10:36:04 +08:00
George Hotz
52e7f1c108
add new model CI
2024-09-25 10:23:06 +08:00
ttomsa
76bd4c7d5f
advanced setitem ( #6262 )
...
* advanced setitem draft
* add setitem tests
* fix for tests
* small change
* handle repeated indices with test
* fix v broadcasting to mask
* clean up a bit
* open more tests
* clean up, fixes issue with scalar tensor index
* fix
* fix index_put_ and linter
* add type annotation
* done
* remove non contiguous hack
* woops linter
* name fix
* add back type notation
* more type notation
* final
* linter
* check lazydata not shared
* no numpy
* no numpy
* rename
* index benchmark
* linter
* no cloning time
* rm benchmark
* new function
* rm contiguous and cast early
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-24 22:14:59 -04:00
qazal
3bf25aae78
start work on global buffer count limit [run_process_replay] ( #6722 )
...
* add a bufs_max option
* simple spec
2024-09-25 09:51:56 +08:00
George Hotz
b0ffe2452b
bump line count to 9800
2024-09-25 09:15:30 +08:00
chenyu
5c240c34aa
split validhack into simplify idx and drop valids ( #6719 )
...
* split validhack into simplify idx and drop valids
will be using the simplify idx for non-image buffer
[run_process_replay]
* shorter
2024-09-24 09:40:27 -04:00
qazal
cefc3e9382
make all schedules immutable [run_process_replay] ( #6718 )
...
* compute inputs and outputs in LBScheduleItem [run_process_replay]
* simpler metadata, delete __hash__
* no dynamic field
* test_diff_schedule
2024-09-24 21:08:16 +08:00
qazal
29330014ab
give FUZZ_SCHEDULE views a base ( #6717 )
...
* memoryview to bytes
* give FUZZ_SCHEDULE views a base
2024-09-24 19:20:37 +08:00
nimlgen
f0019ad29c
bump ci test timeout for test_speed_exec_time ( #6715 )
...
* bump ci test timeout for test_speed_exec_time
* more
2024-09-24 18:44:09 +08:00
qazal
1c03fb69c9
viz dedup assert groupby ctx [run_process_replay] ( #6714 )
2024-09-24 18:17:21 +08:00
chenyu
8d75326cb5
do not fold var with min==max ( #6713 )
...
not really used, want it to keep as a var for valid simplification
[run_process_replay]
2024-09-24 06:16:34 -04:00
chenyu
9e51879019
fix idx setup in image_valid test_openpilot_conv3 ( #6710 )
...
* fix idx setup in image_valid test_openpilot_conv3
* corrected output and sad
2024-09-24 05:49:04 -04:00
qazal
ae3f3fec38
refactor DEFINE_GLOBAL inputs to list [run_process_replay] ( #6711 )
2024-09-24 17:43:24 +08:00
wozeparrot
f932116e05
feat: small things from default_threefry ( #6708 )
2024-09-24 17:00:47 +08:00
chenyu
f2700ac58a
construct a candidate set to attempt valid idx rewrite ( #6706 )
...
preparation for the brute force attempt for some valids
2024-09-24 04:12:21 -04:00
wozeparrot
2be0b26a1f
rand only supports single device ( #6682 )
2024-09-24 16:07:44 +08:00
nimlgen
75b7627db7
qcom do not recreate memoryviews on updates ( #6701 )
2024-09-24 15:36:22 +08:00
chenyu
a6078c099f
simpler idx rewrite structure in simplify_valid_image_load ( #6704 )
...
express valid into things to check when rewriting idx. it's the same for single clause or a simplex
[run_process_replay]
2024-09-24 03:35:39 -04:00
nimlgen
d3ed50c769
fix typo in 'Too many resources requested for launch' ( #6705 )
2024-09-24 15:33:01 +08:00
wozeparrot
ef7a74bfa0
feat: use /raid/downloads on tinybox ( #6702 )
2024-09-24 15:26:31 +08:00
nimlgen
ca66b11e07
qcom fix disasm ( #6703 )
2024-09-24 15:23:43 +08:00
nimlgen
a473bf4ba9
do not always update float dims ( #6699 )
...
* do not always update float dims
* linter
* isinsatcen
2024-09-24 14:40:45 +08:00
qazal
048483ee0b
viz fold const nodes and UOp/float4 syntax highlight ( #6695 )
...
* fold const nodes
* show rewrite count
* hotfix: cpp
* more syntax highlight
* custom language definitions
* only cpp
* small fixups for UPat
* extend python
* cleanups
* rewrites helper
* better message
2024-09-24 14:36:59 +08:00
chenyu
4bb1694f49
more tests about bounds of UOp divs ( #6700 )
2024-09-24 00:41:43 -04:00
chenyu
79aef64d70
update tests in test_image_valid ( #6698 )
2024-09-24 00:04:21 -04:00
Anurag Lamsal
568757e087
fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory ( #6649 )
2024-09-24 11:20:44 +08:00
chenyu
4a2fa0b627
clean up apply OptOps.PADTO [run_process_replay] ( #6694 )
2024-09-23 23:13:50 -04:00
chenyu
f703180356
hotfix missed cast in cstyle code_for_workitem ( #6693 )
...
`NOLOCALS=1 python -c "from tinygrad import Tensor; Tensor.randn((5, 5)).realize()"` works on green box with this fix #6687
2024-09-23 22:18:18 -04:00
samm393
19c11792fd
Flux.1 ( #6334 )
...
* initial commit
* whitespace
* get rid of torch import
* indentation
* less hardcoding
* add flux.1-dev
* jit
* no double
* t5 tidy up
* validation image
* reuse sdxl autoencoder
* typing changes
* empty lines
* remove unneeded comments
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
chenyu
31b9c74c77
tiny import cleanup and fix typo ( #6692 )
2024-09-23 21:48:23 -04:00
qazal
02c0c09fb9
VIZ syntax highlighting and new colors ( #6686 )
...
* VIZ syntax highlighting
* more work
2024-09-24 09:41:07 +08:00
ignaciosica
0ffbd75af8
Refactor TC [run_process_replay] ( #6456 )
...
* unify _apply_tc_opt
* refactor tc pt2
* hotfix: remove blank line
* refactor upcast_axes
* simplify check before using tensor_cores
* rename upcast_axes
* fix amx and remove counting hack
* AMX cleanup
* hotfix: bug
* skip hand-coded TC opts if AMX to also skip if emulating
* hotfix: AMX bug
* hotfix: AMX tests
* minor format change
* hotfix: minor var name change
* hotfix: minor refactor
* hotfix: hand-coded tc bug
* hotfix: simple change
* fix comment
* hotfix: refactor attempt to local N
* hotfix: AMD TC spacing
* refactor tensor core options in kernel.py to include opt order
* hotfix: add comments to TensorCore dataclass
* hotfix: improve comment on TC dataclas
* hotfix: refactor opt_seq loop
* hotfix: add comments in hand-coded TC opts
* hotfix: upcast_axes comment
* hotfix: remove unroll from opt_seq
* hotfix: bug + remove unroll from opt_seq
* hotfix: rename opt_seq into opts_seq
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 09:05:29 +08:00
George Hotz
b9e6d42a1f
Revert "gated native math in OpenCL ( #6683 )" ( #6691 )
...
This reverts commit 2fe3eeed17
.
2024-09-24 08:48:10 +08:00
Harald Schäfer
382938ab41
Add command to show default backend in README ( #6688 )
...
* Update README.md
* Update README.md
* Update README.md
2024-09-24 08:42:18 +08:00