Commit Graph

6154 Commits

Author SHA1 Message Date
George Hotz cd534dee11
cstyle changes that don't pass process replay (#6734)
* cstyle changes that don't pass process replay

* add constant folder back there

* cleanups

* const

* fix some tests

* bfloat16 too

* complete set of types

* that cast shouldn't be needed

* that was a questionable test
2024-09-25 17:33:34 +08:00
George Hotz 232edcfd4f
cast bool for type verify [run_process_replay] (#6742) 2024-09-25 17:12:16 +08:00
George Hotz cb22ef379a
truncate consts early (#6741)
* truncate consts early

* ptx still fails

* Update dtype.py
2024-09-25 16:49:51 +08:00
nimlgen e31552e2e0
qcom reinit queue on exec (#6728)
* qcom setup on exec as gpu=1

* linter

* gpulike

* offsets
2024-09-25 16:08:50 +08:00
George Hotz 882339f729
remove parens from neg (#6738) 2024-09-25 15:38:20 +08:00
qazal 5ad2f95d01
process replay diff stats (#6736)
* process replay diff stats

* fix tuples
2024-09-25 15:19:56 +08:00
nimlgen 56979aa3ed
qcom ioctl log levels (#6735) 2024-09-25 14:59:27 +08:00
chenyu 66af8bb54c
use UOp.replace and UOp.define_var in validhack (#6730)
easier to see the diff in replacement
[run_process_replay]
2024-09-25 02:51:34 -04:00
chenyu ff25bfb1b0
conv backward tests in test_simplify_valid_idx (#6727)
the backward idx is pretty ugly now
2024-09-25 02:51:07 -04:00
qazal 6c69fec1ef
viz more info for rewrite location (#6729) 2024-09-25 14:49:40 +08:00
George Hotz 39f78619ff
cstyle replay [run_process_replay] (#6731)
* real minimum cstyle change

* make it match

* bring back DEFINE_GLOBAL store marking writable

* bump line count to 9800

* closer

* precompute don't render

* cast/bitcast too

* smem_align

* vectorize

* more pr match

* remove that test

* less PR diff

* cstyle changes that [run_process_replay]
2024-09-25 14:26:05 +08:00
nimlgen e1caa24a92
qcom fix binded queue might be overwritten (#6712) 2024-09-25 12:45:23 +08:00
George Hotz dd575da7ee
real minimum cstyle change (#6709)
* real minimum cstyle change

* make it match

* bring back DEFINE_GLOBAL store marking writable

* bump line count to 9800

* closer

* precompute don't render

* cast/bitcast too

* smem_align

* vectorize

* more pr match

* remove that test

* less PR diff
2024-09-25 12:40:46 +08:00
chenyu e6a1b5aa8f
more test_simplify_valid_idx cleanup (#6726)
moved UOps.VECTORIZE of idx into the helper
2024-09-24 23:47:42 -04:00
chenyu 14524eeddc
test_image_valid.py -> test_simplify_valid_idx.py (#6724)
restructure the tests, will use the same file for non-image tests
2024-09-24 23:32:27 -04:00
qazal e0d8685c99
test_masked_upcast_wino check device buf_max (#6723) 2024-09-25 11:26:53 +08:00
George Hotz f45d178a55 hotfix: support JIT_BATCH_SIZE=0, make that the default 2024-09-25 10:36:04 +08:00
George Hotz 52e7f1c108 add new model CI 2024-09-25 10:23:06 +08:00
ttomsa 76bd4c7d5f
advanced setitem (#6262)
* advanced setitem draft

* add setitem tests

* fix for tests

* small change

* handle repeated indices with test

* fix v broadcasting to mask

* clean up a bit

* open more tests

* clean up, fixes issue with scalar tensor index

* fix

* fix index_put_ and linter

* add type annotation

* done

* remove non contiguous hack

* woops linter

* name fix

* add back type notation

* more type notation

* final

* linter

* check lazydata not shared

* no numpy

* no numpy

* rename

* index benchmark

* linter

* no cloning time

* rm benchmark

* new function

* rm contiguous and cast early

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-24 22:14:59 -04:00
qazal 3bf25aae78
start work on global buffer count limit [run_process_replay] (#6722)
* add a bufs_max option

* simple spec
2024-09-25 09:51:56 +08:00
George Hotz b0ffe2452b bump line count to 9800 2024-09-25 09:15:30 +08:00
chenyu 5c240c34aa
split validhack into simplify idx and drop valids (#6719)
* split validhack into simplify idx and drop valids

will be using the simplify idx for non-image buffer
[run_process_replay]

* shorter
2024-09-24 09:40:27 -04:00
qazal cefc3e9382
make all schedules immutable [run_process_replay] (#6718)
* compute inputs and outputs in LBScheduleItem [run_process_replay]

* simpler metadata, delete __hash__

* no dynamic field

* test_diff_schedule
2024-09-24 21:08:16 +08:00
qazal 29330014ab
give FUZZ_SCHEDULE views a base (#6717)
* memoryview to bytes

* give FUZZ_SCHEDULE views a base
2024-09-24 19:20:37 +08:00
nimlgen f0019ad29c
bump ci test timeout for test_speed_exec_time (#6715)
* bump ci test timeout for test_speed_exec_time

* more
2024-09-24 18:44:09 +08:00
qazal 1c03fb69c9
viz dedup assert groupby ctx [run_process_replay] (#6714) 2024-09-24 18:17:21 +08:00
chenyu 8d75326cb5
do not fold var with min==max (#6713)
not really used, want it to keep as a var for valid simplification
[run_process_replay]
2024-09-24 06:16:34 -04:00
chenyu 9e51879019
fix idx setup in image_valid test_openpilot_conv3 (#6710)
* fix idx setup in image_valid test_openpilot_conv3

* corrected output and sad
2024-09-24 05:49:04 -04:00
qazal ae3f3fec38
refactor DEFINE_GLOBAL inputs to list [run_process_replay] (#6711) 2024-09-24 17:43:24 +08:00
wozeparrot f932116e05
feat: small things from default_threefry (#6708) 2024-09-24 17:00:47 +08:00
chenyu f2700ac58a
construct a candidate set to attempt valid idx rewrite (#6706)
preparation for the brute force attempt for some valids
2024-09-24 04:12:21 -04:00
wozeparrot 2be0b26a1f
rand only supports single device (#6682) 2024-09-24 16:07:44 +08:00
nimlgen 75b7627db7
qcom do not recreate memoryviews on updates (#6701) 2024-09-24 15:36:22 +08:00
chenyu a6078c099f
simpler idx rewrite structure in simplify_valid_image_load (#6704)
express valid into things to check when rewriting idx. it's the same for single clause or a simplex
[run_process_replay]
2024-09-24 03:35:39 -04:00
nimlgen d3ed50c769
fix typo in 'Too many resources requested for launch' (#6705) 2024-09-24 15:33:01 +08:00
wozeparrot ef7a74bfa0
feat: use /raid/downloads on tinybox (#6702) 2024-09-24 15:26:31 +08:00
nimlgen ca66b11e07
qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
nimlgen a473bf4ba9
do not always update float dims (#6699)
* do not always update float dims

* linter

* isinsatcen
2024-09-24 14:40:45 +08:00
qazal 048483ee0b
viz fold const nodes and UOp/float4 syntax highlight (#6695)
* fold const nodes

* show rewrite count

* hotfix: cpp

* more syntax highlight

* custom language definitions

* only cpp

* small fixups for UPat

* extend python

* cleanups

* rewrites helper

* better message
2024-09-24 14:36:59 +08:00
chenyu 4bb1694f49
more tests about bounds of UOp divs (#6700) 2024-09-24 00:41:43 -04:00
chenyu 79aef64d70
update tests in test_image_valid (#6698) 2024-09-24 00:04:21 -04:00
Anurag Lamsal 568757e087
fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory (#6649) 2024-09-24 11:20:44 +08:00
chenyu 4a2fa0b627
clean up apply OptOps.PADTO [run_process_replay] (#6694) 2024-09-23 23:13:50 -04:00
chenyu f703180356
hotfix missed cast in cstyle code_for_workitem (#6693)
`NOLOCALS=1 python -c "from tinygrad import Tensor; Tensor.randn((5, 5)).realize()"` works on green box with this fix #6687
2024-09-23 22:18:18 -04:00
samm393 19c11792fd
Flux.1 (#6334)
* initial commit

* whitespace

* get rid of torch import

* indentation

* less hardcoding

* add flux.1-dev

* jit

* no double

* t5 tidy up

* validation image

* reuse sdxl autoencoder

* typing changes

* empty lines

* remove unneeded comments

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
chenyu 31b9c74c77
tiny import cleanup and fix typo (#6692) 2024-09-23 21:48:23 -04:00
qazal 02c0c09fb9
VIZ syntax highlighting and new colors (#6686)
* VIZ syntax highlighting

* more work
2024-09-24 09:41:07 +08:00
ignaciosica 0ffbd75af8
Refactor TC [run_process_replay] (#6456)
* unify _apply_tc_opt

* refactor tc pt2

* hotfix: remove blank line

* refactor upcast_axes

* simplify check before using tensor_cores

* rename upcast_axes

* fix amx and remove counting hack

* AMX cleanup

* hotfix: bug

* skip hand-coded TC opts if AMX to also skip if emulating

* hotfix: AMX bug

* hotfix: AMX tests

* minor format change

* hotfix: minor var name change

* hotfix: minor refactor

* hotfix: hand-coded tc bug

* hotfix: simple change

* fix comment

* hotfix: refactor attempt to local N

* hotfix: AMD TC spacing

* refactor tensor core options in kernel.py to include opt order

* hotfix: add comments to TensorCore dataclass

* hotfix: improve comment on TC dataclas

* hotfix: refactor opt_seq loop

* hotfix: add comments in hand-coded TC opts

* hotfix: upcast_axes comment

* hotfix: remove unroll from opt_seq

* hotfix: bug + remove unroll from opt_seq

* hotfix: rename opt_seq into opts_seq

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 09:05:29 +08:00
George Hotz b9e6d42a1f
Revert "gated native math in OpenCL (#6683)" (#6691)
This reverts commit 2fe3eeed17.
2024-09-24 08:48:10 +08:00
Harald Schäfer 382938ab41
Add command to show default backend in README (#6688)
* Update README.md

* Update README.md

* Update README.md
2024-09-24 08:42:18 +08:00