qazal
d8e5d5c663
move VIZ=1 tests to fuzzers ( #6574 )
2024-09-18 12:12:03 +08:00
ethanreidel
ca8bad90a1
Fix typo in ops.py ( #6572 )
2024-09-17 21:25:00 -04:00
nimlgen
9894f20684
dsp offset buffer ( #6570 )
...
* dsp offset buffer
* view
2024-09-17 23:34:21 +08:00
George Hotz
28e565dc0d
prune independent kernels for openpilot [run_process_replay] ( #6569 )
...
* prune independent kernels for openpilot [run_process_replay]
* new pruning
* prune first, then memory plan
2024-09-17 20:02:38 +08:00
qazal
9295bc0189
viz more work [run_process_replay] ( #6568 )
...
* infra
* found it
* real work
* bring those back
* cleanup test_viz
* comment that out
2024-09-17 19:27:09 +08:00
qazal
455a27dd43
start viz unittests ( #6550 )
...
* test_viz
* more tests
2024-09-17 18:58:23 +08:00
George Hotz
67a03e72bb
remove expr_idxs [run_process_replay] ( #6567 )
...
* remove expr_idxs [run_process_replay]
* goodbye that test
2024-09-17 18:34:51 +08:00
George Hotz
9ebbedc37f
hotfix: remove expr_idxs from graph
2024-09-17 18:02:01 +08:00
chenyu
b947db3de1
don't fold mul mod for common factor ( #6566 )
...
it makes valid pattern more annoying
2024-09-17 06:01:27 -04:00
qazal
a2f446653e
add swizzle_st [run_process_replay] ( #6561 )
...
* add swizzle_st [run_process_replay]
* reduceop arg can stay
2024-09-17 15:37:39 +08:00
Gaétan Lepage
f214bb140d
test: relax tolerance of test_broadcastdot ( #6560 )
2024-09-17 03:26:39 -04:00
chenyu
5fb877c78c
generic valid match criteria of #6552 ( #6558 )
...
455 -> 364 valids.
generalize `idx < image bound` to `idx < image bound + c` for some `c`
2024-09-17 02:40:36 -04:00
George Hotz
0ab06d5840
push geps through wmma ( #6559 )
...
* push geps through wmma
* update tests
2024-09-17 14:38:40 +08:00
qazal
5a30a32af8
small viz fixups from the swizzle pads branch [run_process_replay] ( #6557 )
...
* small viz fixups from the swizzle pads branch [run_process_replay]
* handle indexed ones
2024-09-17 14:37:53 +08:00
George Hotz
ffce3ed896
add some new rules ( #6555 )
...
* add some new rules
* fix that
* non controversial
2024-09-17 13:59:55 +08:00
chenyu
c62b6fd8f0
match any statement in valid for simplification ( #6554 )
2024-09-17 01:39:47 -04:00
George Hotz
006c7c5747
remove unused rules in new expand [run_process_replay] ( #6553 )
2024-09-17 13:18:18 +08:00
George Hotz
a2239c812e
minimum new style expand ( #6534 )
...
* minimum new style expand [run_process_replay]
* float4 folding works
* fix uop graph
* if means or
* dype.count idx overload
* fix test arange
* expand nope
* fix expand contract
* fix amd tensor core
* oh, that's a good test with a real failure
* remove prints
* early reduce
* tomorrow, we remove sorted on expand args
* fix wmma issue
* that makes test_arange pass
* vectorized folding
* no check
* broadcast
* fix clang with self assign rule
2024-09-17 13:02:41 +08:00
kormann
f5dd25d376
enable whisper batch for long sequences ( #6458 )
...
* long batch +test
* long batch +test
* cleanup
* rollback syntactic changes
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-17 00:42:10 -04:00
chenyu
7c942418a1
other side of simple out of bound valid case ( #6552 )
...
462 -> 455
2024-09-16 23:57:15 -04:00
chenyu
aeaf7894a7
more generic version of #6548 ( #6549 )
...
x*(-1)<0 can be generalized to x*(-1)<c, 473 -> 462 valids
2024-09-16 23:17:16 -04:00
chenyu
596f41eb46
simple drop image valid case ( #6548 )
...
* simple drop image valid case
started unit test, 530 -> 473 valids
* cleanup
2024-09-16 22:54:07 -04:00
chenyu
798be6bb74
add gated read_image count in openpilot compile2 ( #6546 )
...
530 to go
2024-09-16 21:17:00 -04:00
nimlgen
665b4203f8
dsp power managment ( #6544 )
...
* dsp power managment
* not needed
* oops
2024-09-16 23:34:01 +08:00
nimlgen
25d8f3046a
dsp do not flush libs to ds ( #6531 )
...
* dsp use sc
* no flush to fs
* ruff
* tiny nit
* shorter
2024-09-16 16:42:15 +08:00
qazal
dae3615008
replace viz graph when it's sink ( #6541 )
2024-09-16 16:00:27 +08:00
qazal
2a5a53c3db
remove extra scheduler graph call, VIZ does this [run_process_replay] ( #6540 )
2024-09-16 14:52:50 +08:00
George Hotz
c1b2472dea
reorder alu/vectorize ( #6538 )
2024-09-16 14:28:14 +08:00
George Hotz
42ba887daa
remove logic to vectorize reduces ( #6536 )
...
* remove logic to vectorize reduces
* fix tests
2024-09-16 14:04:48 +08:00
qazal
607113fcdf
fix vectorized dtype repr [run_process_replay] ( #6535 )
2024-09-16 13:42:55 +08:00
qazal
9b9b83b8b0
viz tests ( #6532 )
...
* vizz fuzz tests
* caching
* print timings
* hotfix: update currentRewrite onClick
* import from typing
* indent into __main__
2024-09-16 13:08:42 +08:00
George Hotz
07bd6e070d
add more uops tests for vmin/vmax/const_factor/divides ( #6533 )
2024-09-16 13:06:31 +08:00
ignaciosica
c447ec2190
Fix amx shape [run_process_replay] ( #6524 )
...
* fix amx shape (sz,sz,sz) -> (sz,sz,1)
* revert check
2024-09-16 09:49:55 +08:00
chenyu
1683b274b6
main example we want the valid removed ( #6527 )
...
* main example we want the valid removed
* ast lines are long
2024-09-15 21:49:10 -04:00
George Hotz
e1b21879a7
minor changes from new expand [run_process_replay] ( #6528 )
...
* minor changes from new expand [run_process_replay]
* explain that
2024-09-16 09:48:37 +08:00
Tim Becker
3450382a77
Don't re-check patterns when uop.arg is None ( #6525 )
2024-09-16 09:46:59 +08:00
qazal
a104ecf79b
refactor for SWIZZLE with different st dims [run_process_replay] ( #6526 )
...
* refactor for supporting swizzles with different shape dims [run_process_replay]
* rename
2024-09-16 09:41:03 +08:00
George Hotz
21835fc08c
more graph rewrite tests ( #6521 )
2024-09-16 09:20:54 +08:00
chenyu
6be0cc387c
_get_add_chain(x) -> _get_chain(x, BinaryOps.ADD) ( #6523 )
...
need MUL for valid [run_process_replay]
2024-09-15 10:54:13 -04:00
chenyu
b2c286f567
fix typing for test_ops ( #6520 )
...
mostly passed TYPED=1 python3 -m pytest -n=auto test/test_ops.py.
one last test specifically set an invalid value to test the exception, and to ignore that we need to import typeguard. And to get a working version of typeguard, we would need to get rid of dependency on tensorflow_addons because it requires a very old version of typeguard
2024-09-15 06:18:36 -04:00
George Hotz
cd90092f14
graph rewrite tests ( #6519 )
...
* more graph rewrite tests
* more complex test cases
* more tests
* more tests
* cleanups
* 9600 lines
* cleanups
2024-09-15 17:29:16 +08:00
qazal
89b950c6b3
viz more work ( #6517 )
...
* infra
* actually replace the UOp
* extra per rewrite
* dont allow pyint
2024-09-15 16:42:17 +08:00
qazal
f69251c6b4
assert pyint in linearize_uop [run_process_replay] ( #6518 )
2024-09-15 16:29:05 +08:00
George Hotz
5132bab48d
hotfix: add TYPED=1 support
2024-09-15 14:44:26 +08:00
qazal
2d53e47b14
refactor viz saved context (prereq for tree view) ( #6516 )
...
* more styling
* warns
* refactor viz ctx to dataclass
* meh, fine for now
* name ctx
* allow smaller zooms
* more work
* fixup ctx.diffs
2024-09-15 14:08:55 +08:00
qazal
893a24f60f
viz minor stuff ( #6515 )
...
* some style cleanups
* wrap rewrites
2024-09-15 12:05:31 +08:00
qazal
d0262ac6ab
make ScheduleItem hashable [run_process_replay] ( #6512 )
2024-09-14 18:31:33 +08:00
qazal
4ffb722d4e
var_vals prereq for deleting LBScheduleItem [run_process_replay] ( #6511 )
2024-09-14 17:00:30 +08:00
George Hotz
9188245677
Viz ( #6502 )
...
* start viz tool
* start work
* more readme
* graceful shutdown that reloader
* add VIZ=1
* aesthetics
* typings
* more work
* work left
* more work on rewrites saving
* maybe try zoom
* add some metadata
* generic extra, show code and ast
* more tooling
* add rewritten graphs
* show graph_rewrites
* small details
* more diff cleanups
* differ as the cherry on top
* no useless styles
---------
Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-14 16:15:29 +08:00
nimlgen
052bf43ed4
dsp check buffers count ( #6509 )
2024-09-14 10:16:58 +03:00