samm393
|
19c11792fd
|
Flux.1 (#6334)
* initial commit
* whitespace
* get rid of torch import
* indentation
* less hardcoding
* add flux.1-dev
* jit
* no double
* t5 tidy up
* validation image
* reuse sdxl autoencoder
* typing changes
* empty lines
* remove unneeded comments
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
|
2024-09-24 10:08:04 +08:00 |
chenyu
|
31b9c74c77
|
tiny import cleanup and fix typo (#6692)
|
2024-09-23 21:48:23 -04:00 |
qazal
|
02c0c09fb9
|
VIZ syntax highlighting and new colors (#6686)
* VIZ syntax highlighting
* more work
|
2024-09-24 09:41:07 +08:00 |
ignaciosica
|
0ffbd75af8
|
Refactor TC [run_process_replay] (#6456)
* unify _apply_tc_opt
* refactor tc pt2
* hotfix: remove blank line
* refactor upcast_axes
* simplify check before using tensor_cores
* rename upcast_axes
* fix amx and remove counting hack
* AMX cleanup
* hotfix: bug
* skip hand-coded TC opts if AMX to also skip if emulating
* hotfix: AMX bug
* hotfix: AMX tests
* minor format change
* hotfix: minor var name change
* hotfix: minor refactor
* hotfix: hand-coded tc bug
* hotfix: simple change
* fix comment
* hotfix: refactor attempt to local N
* hotfix: AMD TC spacing
* refactor tensor core options in kernel.py to include opt order
* hotfix: add comments to TensorCore dataclass
* hotfix: improve comment on TC dataclas
* hotfix: refactor opt_seq loop
* hotfix: add comments in hand-coded TC opts
* hotfix: upcast_axes comment
* hotfix: remove unroll from opt_seq
* hotfix: bug + remove unroll from opt_seq
* hotfix: rename opt_seq into opts_seq
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
|
2024-09-24 09:05:29 +08:00 |
George Hotz
|
b9e6d42a1f
|
Revert "gated native math in OpenCL (#6683)" (#6691)
This reverts commit 2fe3eeed17 .
|
2024-09-24 08:48:10 +08:00 |
Harald Schäfer
|
382938ab41
|
Add command to show default backend in README (#6688)
* Update README.md
* Update README.md
* Update README.md
|
2024-09-24 08:42:18 +08:00 |
George Hotz
|
46fab1f185
|
hotfix: curved edges in viz
|
2024-09-23 19:45:35 +08:00 |
qazal
|
ee050d31d7
|
viz more touchups (#6685)
* dont print if we're running VIZ
* 242424
|
2024-09-23 19:44:28 +08:00 |
George Hotz
|
2fe3eeed17
|
gated native math in OpenCL (#6683)
* gated native math
* Update cstyle.py
|
2024-09-23 19:22:13 +08:00 |
George Hotz
|
84072166db
|
move mul consts like add consts (#6684)
|
2024-09-23 19:21:53 +08:00 |
George Hotz
|
de259e3f09
|
hotfix: add compile3 to comma CI
|
2024-09-23 18:25:49 +08:00 |
George Hotz
|
7c38121280
|
load penalty (#6681)
* bias/bn loads after loops
* load penalty in fix_priority
* more generic test
|
2024-09-23 18:12:12 +08:00 |
George Hotz
|
431ffc4254
|
hotfix: delete float16 failing
|
2024-09-23 17:42:57 +08:00 |
qazal
|
aad7c9c883
|
viz adjustable metadata (#6679)
* move from grid to flexbox
* viz adjustable metadata
* w-size
|
2024-09-23 17:31:51 +08:00 |
George Hotz
|
2f2f933e50
|
fix buffer shape regression from onnx (#6678)
|
2024-09-23 16:58:42 +08:00 |
qazal
|
b438e3cc19
|
viz bugfix click in middle of UOps (#6676)
|
2024-09-23 16:44:19 +08:00 |
chenyu
|
f55459c98e
|
failed validhack test for a 0.9.7 conv (#6677)
|
2024-09-23 04:43:47 -04:00 |
nimlgen
|
94cbb1cd32
|
qcom image copyout (#6667)
* qcom copyout
* copyin
* linter
* fix
* linter
* myoy
|
2024-09-23 16:11:43 +08:00 |
George Hotz
|
417a19a292
|
uop priority inversion (#6670)
* make checks simpler [run_process_replay]
* reorder uops
* fix inversion [run_process_replay]
* no need to move SPECIALs
* Update uopgraph.py
|
2024-09-23 15:53:53 +08:00 |
qazal
|
49bf92afa2
|
schedule UOps.ASSIGN (#6661)
|
2024-09-23 15:44:12 +08:00 |
George Hotz
|
9f1f445a5f
|
reorder uops (#6672)
|
2024-09-23 15:21:59 +08:00 |
qazal
|
e2d6e10ddf
|
hotfix: reset benchmarks cache for process replay (#6671)
|
2024-09-23 15:13:02 +08:00 |
chenyu
|
0362dbbbe8
|
relax idx simplification given valid (#6669)
apply to kernels in op 0.9.7.
if a valid has a complicated expr, we cannot drop valid but it's possible to simplify idx given valid
|
2024-09-23 03:04:57 -04:00 |
qazal
|
7ca9ffa494
|
misc UOp st cleanups (#6668)
|
2024-09-23 14:16:42 +08:00 |
chenyu
|
26ebb7cab4
|
don't use div_folding in lt_folding (#6666)
* don't use div_folding in lt_folding
valids 35 -> 13
* fails the same as before
|
2024-09-23 01:50:18 -04:00 |
qazal
|
e9248b9e27
|
viz highlight new nodes (#6665)
* p2
* ret adds and dels
* maybe that way
* add additions
* simpler test_viz
|
2024-09-23 13:46:18 +08:00 |
chenyu
|
da5b741656
|
removed valid in openpilot conv (#6619)
35 valids left
|
2024-09-23 00:30:18 -04:00 |
George Hotz
|
52c2c4df9c
|
fix match of sz 0 + dedup kernel ast [run_process_replay] (#6663)
* fix match of sz 0 [run_process_replay]
* empty graph rewrite to dedup st
|
2024-09-23 11:56:53 +08:00 |
chenyu
|
2d4d594994
|
validhack is_irreducible helper (#6664)
[run_process_replay]
|
2024-09-22 23:42:47 -04:00 |
chenyu
|
1923932339
|
canonicalize simplex lt (#6658)
(X := a0*x0 + a1*x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints
|
2024-09-22 23:04:47 -04:00 |
wozeparrot
|
46e360fdc0
|
check bfloat16 range with threefry (#6660)
|
2024-09-23 10:48:44 +08:00 |
qazal
|
d24e4b1042
|
viz more kernel view work (#6659)
|
2024-09-23 10:48:35 +08:00 |
qazal
|
6be1bf09f1
|
hotfix: bring COMPARE_SCHEDULE=0 back (#6657)
|
2024-09-23 10:39:43 +08:00 |
George Hotz
|
e945fa9c5c
|
put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]
* those are local too
|
2024-09-23 10:29:17 +08:00 |
chenyu
|
90c1ccc402
|
simpler drop valid check in simplify_valid_image_load (#6653)
* simpler drop valid check in simplify_valid_image_load
* update tests
|
2024-09-22 21:46:39 -04:00 |
qazal
|
99ed9fb75e
|
simpler verify_ast [run_process_replay] (#6654)
|
2024-09-23 09:40:09 +08:00 |
nimlgen
|
8a9195d86e
|
qcom texs refactor (#6613)
* qcom texs refactor
* fix
* linter
* qcombuf
* linter
|
2024-09-23 09:03:17 +08:00 |
qazal
|
d1bae42d35
|
viz lowerer and graph_rewrite dedup try 2 (#6652)
|
2024-09-22 21:09:46 +08:00 |
qazal
|
6b65d8c461
|
more process replay tracing work [run_process_replay] (#6650)
|
2024-09-22 16:16:58 +08:00 |
George Hotz
|
4fc5a34fe7
|
lowerer is just a graph rewrite, not a class [run_process_replay] (#6648)
|
2024-09-22 14:15:33 +08:00 |
George Hotz
|
0eb710de84
|
move WMMA out of lowerer [run_process_replay] (#6647)
|
2024-09-22 14:05:51 +08:00 |
George Hotz
|
84703d5b77
|
replace the lowerer with a contextual PatternMatcher [run_process_replay] (#6646)
* replace the lowerer with a contextual PatternMatcher [run_process_replay]
* todo
* it's REDUCE by the time it's in lowerer
|
2024-09-22 13:22:26 +08:00 |
qazal
|
4751159139
|
second iteration on viz/serve.py (#6643)
* small detail in checkStatus
* better abstractions for the api
* update test_viz
* ui updates
|
2024-09-22 08:49:44 +08:00 |
qazal
|
5bafed2f88
|
process replay traceback (#6642)
|
2024-09-21 16:53:34 +08:00 |
chenyu
|
9456a625bc
|
const_like type fix (#6641)
`Tuple[ConstType, ...]` instead of `Tuple[ConstType]`
|
2024-09-21 03:44:08 -04:00 |
qazal
|
8edce82124
|
viz show server status (#6640)
|
2024-09-21 15:08:13 +08:00 |
qazal
|
982086f54c
|
UOps.VALID try 2 (#6623)
* make UOps.VALID compile
* fixable tests
* bufs dedup
* cleanup the CONST spec
* regenerate dataset with graph_rewrite
```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
st: ShapeTracker = st_src.arg
return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```
* rm arg
* remove arg
* revert arg removal
This reverts commit 2c35c75c950075d38c9fb8572f14640fe8235f74.
* red test_pickle_define_var
|
2024-09-21 14:19:25 +08:00 |
qazal
|
dd05e27622
|
remove UOp from DEFINE_VAR arg [run_process_replay] (#6639)
* remove UOp from DEFINE_VAR arg [run_process_replay]
* that assert is in `spec`
* more .args to remove
|
2024-09-21 14:07:56 +08:00 |
qazal
|
d2351af019
|
fixup non-void SINKs in tests [run_process_replay] (#6624)
|
2024-09-21 13:29:18 +08:00 |
qazal
|
391d14438e
|
DEFINE_VAR prereqs for VALID [run_process_replay] (#6637)
|
2024-09-21 13:28:39 +08:00 |