Commit Graph

6108 Commits

Author SHA1 Message Date
qazal d8e5d5c663
move VIZ=1 tests to fuzzers (#6574) 2024-09-18 12:12:03 +08:00
ethanreidel ca8bad90a1
Fix typo in ops.py (#6572) 2024-09-17 21:25:00 -04:00
nimlgen 9894f20684
dsp offset buffer (#6570)
* dsp offset buffer

* view
2024-09-17 23:34:21 +08:00
George Hotz 28e565dc0d
prune independent kernels for openpilot [run_process_replay] (#6569)
* prune independent kernels for openpilot [run_process_replay]

* new pruning

* prune first, then memory plan
2024-09-17 20:02:38 +08:00
qazal 9295bc0189
viz more work [run_process_replay] (#6568)
* infra

* found it

* real work

* bring those back

* cleanup test_viz

* comment that out
2024-09-17 19:27:09 +08:00
qazal 455a27dd43
start viz unittests (#6550)
* test_viz

* more tests
2024-09-17 18:58:23 +08:00
George Hotz 67a03e72bb
remove expr_idxs [run_process_replay] (#6567)
* remove expr_idxs [run_process_replay]

* goodbye that test
2024-09-17 18:34:51 +08:00
George Hotz 9ebbedc37f hotfix: remove expr_idxs from graph 2024-09-17 18:02:01 +08:00
chenyu b947db3de1
don't fold mul mod for common factor (#6566)
it makes valid pattern more annoying
2024-09-17 06:01:27 -04:00
qazal a2f446653e
add swizzle_st [run_process_replay] (#6561)
* add swizzle_st [run_process_replay]

* reduceop arg can stay
2024-09-17 15:37:39 +08:00
Gaétan Lepage f214bb140d
test: relax tolerance of test_broadcastdot (#6560) 2024-09-17 03:26:39 -04:00
chenyu 5fb877c78c
generic valid match criteria of #6552 (#6558)
455 -> 364 valids.
generalize `idx < image bound` to `idx < image bound + c` for some `c`
2024-09-17 02:40:36 -04:00
George Hotz 0ab06d5840
push geps through wmma (#6559)
* push geps through wmma

* update tests
2024-09-17 14:38:40 +08:00
qazal 5a30a32af8
small viz fixups from the swizzle pads branch [run_process_replay] (#6557)
* small viz fixups from the swizzle pads branch [run_process_replay]

* handle indexed ones
2024-09-17 14:37:53 +08:00
George Hotz ffce3ed896
add some new rules (#6555)
* add some new rules

* fix that

* non controversial
2024-09-17 13:59:55 +08:00
chenyu c62b6fd8f0
match any statement in valid for simplification (#6554) 2024-09-17 01:39:47 -04:00
George Hotz 006c7c5747
remove unused rules in new expand [run_process_replay] (#6553) 2024-09-17 13:18:18 +08:00
George Hotz a2239c812e
minimum new style expand (#6534)
* minimum new style expand [run_process_replay]

* float4 folding works

* fix uop graph

* if means or

* dype.count idx overload

* fix test arange

* expand nope

* fix expand contract

* fix amd tensor core

* oh, that's a good test with a real failure

* remove prints

* early reduce

* tomorrow, we remove sorted on expand args

* fix wmma issue

* that makes test_arange pass

* vectorized folding

* no check

* broadcast

* fix clang with self assign rule
2024-09-17 13:02:41 +08:00
kormann f5dd25d376
enable whisper batch for long sequences (#6458)
* long batch +test

* long batch +test

* cleanup

* rollback syntactic changes

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-17 00:42:10 -04:00
chenyu 7c942418a1
other side of simple out of bound valid case (#6552)
462 -> 455
2024-09-16 23:57:15 -04:00
chenyu aeaf7894a7
more generic version of #6548 (#6549)
x*(-1)<0 can be generalized to x*(-1)<c, 473 -> 462 valids
2024-09-16 23:17:16 -04:00
chenyu 596f41eb46
simple drop image valid case (#6548)
* simple drop image valid case

started unit test, 530 -> 473 valids

* cleanup
2024-09-16 22:54:07 -04:00
chenyu 798be6bb74
add gated read_image count in openpilot compile2 (#6546)
530 to go
2024-09-16 21:17:00 -04:00
nimlgen 665b4203f8
dsp power managment (#6544)
* dsp power managment

* not needed

* oops
2024-09-16 23:34:01 +08:00
nimlgen 25d8f3046a
dsp do not flush libs to ds (#6531)
* dsp use sc

* no flush to fs

* ruff

* tiny nit

* shorter
2024-09-16 16:42:15 +08:00
qazal dae3615008
replace viz graph when it's sink (#6541) 2024-09-16 16:00:27 +08:00
qazal 2a5a53c3db
remove extra scheduler graph call, VIZ does this [run_process_replay] (#6540) 2024-09-16 14:52:50 +08:00
George Hotz c1b2472dea
reorder alu/vectorize (#6538) 2024-09-16 14:28:14 +08:00
George Hotz 42ba887daa
remove logic to vectorize reduces (#6536)
* remove logic to vectorize reduces

* fix tests
2024-09-16 14:04:48 +08:00
qazal 607113fcdf
fix vectorized dtype repr [run_process_replay] (#6535) 2024-09-16 13:42:55 +08:00
qazal 9b9b83b8b0
viz tests (#6532)
* vizz fuzz tests

* caching

* print timings

* hotfix: update currentRewrite onClick

* import from typing

* indent into __main__
2024-09-16 13:08:42 +08:00
George Hotz 07bd6e070d
add more uops tests for vmin/vmax/const_factor/divides (#6533) 2024-09-16 13:06:31 +08:00
ignaciosica c447ec2190
Fix amx shape [run_process_replay] (#6524)
* fix amx shape (sz,sz,sz) -> (sz,sz,1)

* revert check
2024-09-16 09:49:55 +08:00
chenyu 1683b274b6
main example we want the valid removed (#6527)
* main example we want the valid removed

* ast lines are long
2024-09-15 21:49:10 -04:00
George Hotz e1b21879a7
minor changes from new expand [run_process_replay] (#6528)
* minor changes from new expand [run_process_replay]

* explain that
2024-09-16 09:48:37 +08:00
Tim Becker 3450382a77
Don't re-check patterns when uop.arg is None (#6525) 2024-09-16 09:46:59 +08:00
qazal a104ecf79b
refactor for SWIZZLE with different st dims [run_process_replay] (#6526)
* refactor for supporting swizzles with different shape dims [run_process_replay]

* rename
2024-09-16 09:41:03 +08:00
George Hotz 21835fc08c
more graph rewrite tests (#6521) 2024-09-16 09:20:54 +08:00
chenyu 6be0cc387c
_get_add_chain(x) -> _get_chain(x, BinaryOps.ADD) (#6523)
need MUL for valid [run_process_replay]
2024-09-15 10:54:13 -04:00
chenyu b2c286f567
fix typing for test_ops (#6520)
mostly passed TYPED=1 python3 -m pytest -n=auto test/test_ops.py.

one last test specifically set an invalid value to test the exception, and to ignore that we need to import typeguard. And to get a working version of typeguard, we would need to get rid of dependency on tensorflow_addons because it requires a very old version of typeguard
2024-09-15 06:18:36 -04:00
George Hotz cd90092f14
graph rewrite tests (#6519)
* more graph rewrite tests

* more complex test cases

* more tests

* more tests

* cleanups

* 9600 lines

* cleanups
2024-09-15 17:29:16 +08:00
qazal 89b950c6b3
viz more work (#6517)
* infra

* actually replace the UOp

* extra per rewrite

* dont allow pyint
2024-09-15 16:42:17 +08:00
qazal f69251c6b4
assert pyint in linearize_uop [run_process_replay] (#6518) 2024-09-15 16:29:05 +08:00
George Hotz 5132bab48d hotfix: add TYPED=1 support 2024-09-15 14:44:26 +08:00
qazal 2d53e47b14
refactor viz saved context (prereq for tree view) (#6516)
* more styling

* warns

* refactor viz ctx to dataclass

* meh, fine for now

* name ctx

* allow smaller zooms

* more work

* fixup ctx.diffs
2024-09-15 14:08:55 +08:00
qazal 893a24f60f
viz minor stuff (#6515)
* some style cleanups

* wrap rewrites
2024-09-15 12:05:31 +08:00
qazal d0262ac6ab
make ScheduleItem hashable [run_process_replay] (#6512) 2024-09-14 18:31:33 +08:00
qazal 4ffb722d4e
var_vals prereq for deleting LBScheduleItem [run_process_replay] (#6511) 2024-09-14 17:00:30 +08:00
George Hotz 9188245677
Viz (#6502)
* start viz tool

* start work

* more readme

* graceful shutdown that reloader

* add VIZ=1

* aesthetics

* typings

* more work

* work left

* more work on rewrites saving

* maybe try zoom

* add some metadata

* generic extra, show code and ast

* more tooling

* add rewritten graphs

* show graph_rewrites

* small details

* more diff cleanups

* differ as the cherry on top

* no useless styles

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-14 16:15:29 +08:00
nimlgen 052bf43ed4
dsp check buffers count (#6509) 2024-09-14 10:16:58 +03:00