George Hotz
ded1b38b84
minor dtype cleanup [pr] ( #7124 )
...
* minor dtype cleanup [pr]
* use ptr() function
2024-10-17 17:41:23 +08:00
George Hotz
e7a0ffe46a
break out linearization [pr] ( #6994 )
2024-10-11 15:27:33 +08:00
George Hotz
b199b699ed
use shl everywhere ( #6744 )
...
* use shl everywhere
* fix parens
* late patterns
* works as an extra pass
* ptx
2024-09-26 09:59:36 +08:00
George Hotz
0ab06d5840
push geps through wmma ( #6559 )
...
* push geps through wmma
* update tests
2024-09-17 14:38:40 +08:00
George Hotz
a2239c812e
minimum new style expand ( #6534 )
...
* minimum new style expand [run_process_replay]
* float4 folding works
* fix uop graph
* if means or
* dype.count idx overload
* fix test arange
* expand nope
* fix expand contract
* fix amd tensor core
* oh, that's a good test with a real failure
* remove prints
* early reduce
* tomorrow, we remove sorted on expand args
* fix wmma issue
* that makes test_arange pass
* vectorized folding
* no check
* broadcast
* fix clang with self assign rule
2024-09-17 13:02:41 +08:00
CaltropHungerton
002f60b4c3
fix intel wmma flop counting, add flop counting tests for different tensor cores ( #6192 )
...
* fix wmma flop counting on intel, add count tests
* half
* add half gemm
* Update test.yml
* one test
* Update test_uops_stats.py
* Update test_uops_stats.py
* Update test_uops_stats.py
* smaller matrix, use unittest skipUnless decorator
2024-08-25 18:37:05 -07:00
George Hotz
16f420f7a7
split full_graph_rewrite and linearize_uop [run_process_replay] ( #6215 )
...
* split full_graph_rewrite and linearize_uop
* fix tests
* graph rewrite in test uops
* add types
2024-08-20 20:12:33 -07:00
qazal
5a266d5d0c
type verify ImageDType and PtrDType [run_process_replay] ( #6137 )
...
* type verify ImageDType and PtrDType [run_process_replay]
* fix tests
2024-08-17 16:37:07 +03:00
George Hotz
912f01ed4b
UOpGraph -> linearize_uop [run_process_replay] ( #6119 )
2024-08-16 19:48:39 -07:00
George Hotz
74ee9febec
remove iter from uopgraph ( #6110 )
...
* remove iter from uopgraph
* linearize returns uops
* fix tests
* linearize in linearize
* tests fix
* touchup
* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
George Hotz
d73bc85ba9
UOpGraph not in renderer or Program [run_process_replay] ( #5867 )
...
* UOpGraph not in renderer or Program [run_process_replay]
* fix some tests
* fix ptx
2024-08-01 16:20:30 -07:00
George Hotz
72621d9e7c
count the specials in uops [run_process_replay] ( #5848 )
...
* count the specials in uops [run_process_replay]
* cleanups
2024-07-31 14:53:18 -07:00
George Hotz
7c4b177e3a
add tests for uops stats ( #5649 )
...
* add tests for uops stats
* no locals skip is fine
* eh
2024-07-22 21:57:03 -07:00
George Hotz
d0ab20a5e5
careful memory counting (with tests to specify behavior) ( #5587 )
2024-07-19 11:37:34 -07:00
George Hotz
2de82b8a5d
remove get_lazyop_info ( #5570 )
...
* don't use get_lazyop_info more
* keep that min
* no ptx for that test
2024-07-19 03:05:33 -07:00
George Hotz
d13654a820
move uopgraph to file [run_process_replay] ( #5364 )
...
* move uopgraph to file [run_process_replay]
* fix print tree test
2024-07-10 17:34:50 -07:00
George Hotz
63a8add2c2
move uops add logic to linearize ( #4952 )
...
* move logic to linearize
* idk how this should work
* empty
2024-06-14 03:52:37 -07:00
George Hotz
9823752397
make uops.add private ( #4950 )
...
* make uops.add private
* modernize all tests
2024-06-14 03:23:25 -07:00
Szymon Ożóg
1e7b7b2c3c
Fix flop coutning for mulacc ( #4640 )
...
* Fix flop coutning for mulacc
* add test_simple_mulacc
* Update test_uops_stats.py
* Update test_uops_stats.py
* revert test_mulacc
* Test for MULACC vs MUL+ADD
2024-05-20 12:06:00 -04:00
George Hotz
2f970a4fc2
all realize 2 ( #4527 )
...
* all realize 2
* tests fixup
* fix more tests
* fix openpilot
* fix tests
* unneeded
2024-05-10 22:43:09 -07:00
George Hotz
827058f030
update tests get_runner ( #4522 )
2024-05-10 20:09:22 -07:00
George Hotz
7425a0c646
CommandQueue is the future ( #3950 )
...
* start of command queue
* cq work
* runs
* cleanup
* outs set
* read is gone
* future buffer work
* command queue is better
* command queue works
* loadops
* delete unneeded
* command queue works
* upd
* fix tests
* use CommandQueue in compile
* delay sync
2024-04-01 17:35:48 -07:00
George Hotz
68ca4d4276
split to schedule.py ( #3949 )
...
* split to schedule.py
* split
2024-03-26 21:02:46 -07:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
George Hotz
1b6e890ef2
uops flop counter ( #3373 )
...
* factor out winograd functions
* test counter
* uops flop counter
* more correct
* ish
* correct
* cleanup
* tests for uops flop counter
* tests still fail
* fix symbolic uops flop cnt
* fix symbolic uops flop cnt
* hmm, it's an alu
* uops alu resolve
* relax that
2024-02-20 09:36:30 +01:00