Commit Graph

6108 Commits

Author SHA1 Message Date
George Hotz 683857de5d
make match a method on UPat [run_process_replay] (#6634)
* make match a method on UPat [run_process_replay]

* remove class stuff

* a cleaner UPatAny
2024-09-20 20:00:03 +08:00
qazal dbe890b358
lowerer don't re-init UOp if we aren't rewriting [run_process_replay] (#6633) 2024-09-20 19:26:49 +08:00
nimlgen 21f2d79461
qcom match gpu impl for reg a6xx_sp_cs_unknown_a9b1 (#6631) 2024-09-20 18:14:00 +08:00
nimlgen 053c4dee55
qcom test for image pitch (#6621)
* qcom test for image pitch

* comment
2024-09-20 18:13:48 +08:00
chenyu 37ddd971e6
validhack explicity check valid has CMPLT [run_process_replay] (#6630)
valid might a const if gate folding is disabled
2024-09-20 06:13:05 -04:00
qazal 581a389a58
limit ctx tracking in TrackedPatternMatcher [run_process_replay] (#6629)
* limit ctx tracking in TrackedPatternMatcher [run_process_replay]

* add regression test
2024-09-20 18:06:05 +08:00
nimlgen 641586cb87
qcom updated ioctl (#6627) 2024-09-20 18:02:40 +08:00
qazal 98644a047b
use UOp.define_var in Variable shape [run_process_replay] (#6626) 2024-09-20 17:58:29 +08:00
chenyu acef3e67fa
add an example that idx is const and valid cannot be removed (#6625)
very weird
2024-09-20 05:46:27 -04:00
chenyu 5707503048
x//a<b -> x <a*b for positive a (#6622)
openpilot valids 47 -> 37
2024-09-20 04:38:47 -04:00
qazal 72c7087420
viz add kernel code (#6620)
* viz add kernel code

* no defaultdict

* ctxs
2024-09-20 16:31:47 +08:00
qazal 2dfb1e022c
UOp st prereqs for valid [run_process_replay] (#6618) 2024-09-20 15:55:35 +08:00
qazal 74f8f86631
viz kernel tree view (#6614)
* viz kernel tree view

* use get_kernel

* remove current_kernel

cleanup current_kernel

* unset kernel name
2024-09-20 15:52:12 +08:00
chenyu b14c1bc417
UOps.RANGE is_increasing (#6615)
* UOps.RANGE is_increasing

283 -> 47 valids

* test
2024-09-20 03:14:52 -04:00
Comma Device 76aa6416d7 qcom: add disassembler with DEBUG >= 5 2024-09-20 07:04:28 +00:00
chenyu 036c2f5b26
validhack use the new style ge for upper bound valid (#6612)
also relaxed the bound check to check vmin/vmax instead just const.
valids 482 -> 283
2024-09-19 23:45:42 -04:00
George Hotz c4d5575c61
beat mlx at resnet 18 (#6611)
* work to beat mlx at resnet18 [run_process_replay]

* pruning

* wino sometimes

* shorter

* comment
2024-09-20 11:28:01 +08:00
qazal 785aaec67c
make VIZ more responsive for big graphs (#6610) 2024-09-20 08:56:13 +08:00
George Hotz 78699d9924
15% more folder speed [run_process_replay] (#6607)
* 15% more folder speed [run_process_replay]

* gep cleanups
2024-09-19 22:34:42 +08:00
chenyu a37e92081a
fix unrolled arange folding (#6606)
* fix unrolled arange folding

also added flop test to test_arange to make sure it's 0 flop

* skip PTX
2024-09-19 09:03:01 -04:00
qazal eebd23155c
move scheduler rewrites into full_ast_rewrite [run_process_replay] (#6609) 2024-09-19 20:03:28 +08:00
qazal 31748c72c4
refactor viz to parse_qs (#6608) 2024-09-19 19:51:41 +08:00
nimlgen 944cc46e11
qcom fix image pitch (#6600)
* qcom fix image pitch

* correct
2024-09-19 18:50:02 +08:00
George Hotz a1a882b006
arange folding with new ge (#6604)
* arange folding with new ge

* bump allowed gated

* bump allowed speed
2024-09-19 18:01:28 +08:00
George Hotz 224151a958
update indexing with UPat.any [run_process_replay] (#6605) 2024-09-19 17:40:17 +08:00
chenyu d148a62f8d
more generic simplify_valid_image_load (#6603)
use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.
2024-09-19 05:33:37 -04:00
George Hotz 718ecad2ee
add UPat.any support [run_process_replay] (#6602)
* add UPat.any support [run_process_replay]

* single arange pattern

* no loop_start and loop_end
2024-09-19 17:11:24 +08:00
qazal d06b36e527
viz open UPat links in editor (#6601)
* move the reloader

* open links in editor

* less things in ui
2024-09-19 16:48:09 +08:00
qazal 94effe2a71
simple VIZ=1 and get_location changes (#6599)
* simpler replace

* this get_location is fine?

* python things

* ctx location
2024-09-19 15:58:33 +08:00
chenyu eeee032b14
tiny cleanup of test_image_valid (#6597)
* tiny cleanup of test_image_valid

Sepcial and Variable to setup UOp

* typo
2024-09-19 03:09:47 -04:00
George Hotz 012a2c449a
fix lt_folding VCONST issue [run_process_replay] (#6424)
* le and ge [run_process_replay]

* bugfix

* fix divides bug

* fix lt_folding issue
2024-09-19 14:59:20 +08:00
qazal 309ea63c03
include cached replaces in VIZ=1 (#6596)
* pick some work from vizmore branch

* fix the ctx location

* fix that loc
2024-09-19 14:48:31 +08:00
qazal 44c18a39a5
fix upat .location for the type verifier (#6592)
* fix upat .location for the type verifier

* get the last tinygrad file
2024-09-19 14:13:12 +08:00
chenyu 496806ce75
another example of openpilot conv with valid (#6595) 2024-09-19 01:54:01 -04:00
qazal 0c9b7c9167
more detailed UPat view in VIZ (#6594) 2024-09-19 13:18:11 +08:00
nimlgen 5e358cf179
qcom set ctx prio (#6593) 2024-09-19 12:30:00 +08:00
chenyu 7f9fd556b0
_min_max for WHERE (#6564)
prereq to gated load simplification

just for int
2024-09-18 23:47:48 -04:00
chenyu 1b6eee02ad
failed test case for openpilot validhack conv (#6590)
* failed test case for openpilot validhack conv

can save 2ms once this is fixed

* fix order
2024-09-18 23:12:30 -04:00
George Hotz dfcc9c9aa3
remove unused view.expr [run_process_replay] (#6591) 2024-09-19 11:09:42 +08:00
George Hotz e015b41ce9
remove e( function just alu( [run_process_replay] (#6589)
* remove e( function just alu( [run_process_replay]

* missed two
2024-09-19 10:24:02 +08:00
George Hotz fa0f678d5a
use the PatternMatcher to validate UOps type [run_process_replay] (#6583)
* use the PatternMatcher to validate UOps type [run_process_replay]

* type check tests pass

* DEFINE_VAR

* fix precommit

* fix tests

* ptx

* type check tests pass

* ptx test

* int64

* ptx barrier

* delete old stuff
2024-09-19 09:59:06 +08:00
qazal d01e011a8c
start multi graph VIZ=1 (#6587)
* add all rewrites

* add a picker

* drop this here

* more work

* reset that

* start multigraph
2024-09-19 08:31:56 +08:00
nimlgen 5a7cb8d5a5
qcom set power to max (#6578) 2024-09-18 18:27:06 +08:00
chenyu bd40a26b8b
image valid test case that current approach does not work (#6584) 2024-09-18 06:06:03 -04:00
chenyu 1ec6bd5125
restructure simplify_valid_image_load [run_process_replay] (#6581)
* restructure simplify_valid_image_load [run_process_replay]

separated parsing valid / idx and simplification

* space

* type
2024-09-18 04:46:41 -04:00
George Hotz d02bb270b7
add copyin copyout for image on GPU [run_process_replay] (#6580)
* add copyin copyout for image on GPU [run_process_replay]

* add timing

* enqueue vs total run

* it's failing but that's fine
2024-09-18 16:06:20 +08:00
chenyu 162ead02a9
remove LOAD where valid is an empty set (#6579)
356 -> 354 valids
2024-09-18 03:49:41 -04:00
George Hotz d4b662c318
new openpilot compile (#6573)
* new openpilot compile

* note, copyout doesn't work for images
2024-09-18 14:22:50 +08:00
chenyu c3a70dbf0d
20 jitted steps in openpilot benchmark (#6577) 2024-09-18 02:15:16 -04:00
chenyu a72d51e277
brute force VALIDHACK matching (#6575)
* brute force VALIDHACK matching

* cleanup

* 9700
2024-09-18 01:59:50 -04:00