Commit Graph

2586 Commits

Author SHA1 Message Date
qazal 12e4a4900a
hotfix: missing return in METAL dm benchmark (#6749) 2024-09-26 09:12:38 +08:00
qazal 8a15ccb414
start gc/mem usage tests for buffer schedule [run_process_replay] (#6737)
* gc tests for buffer schedule [run_process_replay]

* assert global counters, maybe del

* check init

* rm global counters
2024-09-26 08:26:31 +08:00
qazal b629a7998d
early assert buffer count limit [run_process_replay] (#6746)
* better error message for buffer count limit [run_process_replay]

* 3.9 needs that

* assert ScheduleItem

* new _test_buf_cnt
2024-09-26 08:24:26 +08:00
wozeparrot c100f3d406
default threefry (#6116) 2024-09-25 17:45:13 +08:00
mesozoic-egg 992cde05d7
Metal with CDLL instead of py-objc (#6545)
* Add CDLL interface for metal

* remove two unused functions

* Cover most of the API methods

* switch to cdll

* directly call objc message in ops_metal

* keep only obj interface

* Use direct message sending for graph

* may have found a solution to the memoryview on ctypes pointer

* buf indexing bug fixed

* fix c_int

* fix c int to bytes

* fix gpu time bug

* line savings for cdll metal core

* wip

* c int bug

* fix buf casting

* dedup for c_void_p

* dedup for c_void_p

* linter fix

* remove unused stuff

* my py fix

* more mypy error fix

* line savings

* line savings

* rename send_message to msg; add __hash__ and __eq__ for dedup

* wip

* refactor

* refactor

* remove named import from ctypes

* forgot to change variable name

* file reorg, put support.py to ops_metal

* refactor

* hash error

* remove to_ns_array

* test oom exception, fix exception change

* typevar for msg

* add back dedup

* test for compile error

* move constant to graph

* move header constant around

* get label for icb buffer

* check icb label using "in"

* wip fixing mypy reported error

* fixed mypy error

* code formatting

* all_resources dedup match previous

* code formatting

* code formatting; buffer set to objc_id

* revert changes on buf for the manual release, seems like _free is not always called

* skip unless on metal, for test_metal

* fix premature mem release causing seg fault

* test_metal check for device before importing

* Buffer should only be released under _free explicitly

* mypy fixes

* change object ownership

* test compile success

* lint fixes

* remove load_library

* wrap sel_register in cache

* simplify to_struct

* swap lines

* fix type error in to_struct

* bump line to 9800

* remove pyobjc from setup.py

* command buffer should be objc_instance and get released

* stringWithUTF8String: returns objc_instance

* Use constant for MTLPipelineOptionNone

* better explanation for [MTLBuffer contents:] return

* Use dyld_find in case the path differs

* trailing whitespace

* handle exception for methods that take error:

* load /System/Library instead of /Library

* Init c_void_p with None instead of zero for error objects

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-25 17:43:01 +08:00
George Hotz cd534dee11
cstyle changes that don't pass process replay (#6734)
* cstyle changes that don't pass process replay

* add constant folder back there

* cleanups

* const

* fix some tests

* bfloat16 too

* complete set of types

* that cast shouldn't be needed

* that was a questionable test
2024-09-25 17:33:34 +08:00
George Hotz cb22ef379a
truncate consts early (#6741)
* truncate consts early

* ptx still fails

* Update dtype.py
2024-09-25 16:49:51 +08:00
George Hotz 882339f729
remove parens from neg (#6738) 2024-09-25 15:38:20 +08:00
qazal 5ad2f95d01
process replay diff stats (#6736)
* process replay diff stats

* fix tuples
2024-09-25 15:19:56 +08:00
chenyu ff25bfb1b0
conv backward tests in test_simplify_valid_idx (#6727)
the backward idx is pretty ugly now
2024-09-25 02:51:07 -04:00
chenyu e6a1b5aa8f
more test_simplify_valid_idx cleanup (#6726)
moved UOps.VECTORIZE of idx into the helper
2024-09-24 23:47:42 -04:00
chenyu 14524eeddc
test_image_valid.py -> test_simplify_valid_idx.py (#6724)
restructure the tests, will use the same file for non-image tests
2024-09-24 23:32:27 -04:00
qazal e0d8685c99
test_masked_upcast_wino check device buf_max (#6723) 2024-09-25 11:26:53 +08:00
ttomsa 76bd4c7d5f
advanced setitem (#6262)
* advanced setitem draft

* add setitem tests

* fix for tests

* small change

* handle repeated indices with test

* fix v broadcasting to mask

* clean up a bit

* open more tests

* clean up, fixes issue with scalar tensor index

* fix

* fix index_put_ and linter

* add type annotation

* done

* remove non contiguous hack

* woops linter

* name fix

* add back type notation

* more type notation

* final

* linter

* check lazydata not shared

* no numpy

* no numpy

* rename

* index benchmark

* linter

* no cloning time

* rm benchmark

* new function

* rm contiguous and cast early

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-24 22:14:59 -04:00
qazal 3bf25aae78
start work on global buffer count limit [run_process_replay] (#6722)
* add a bufs_max option

* simple spec
2024-09-25 09:51:56 +08:00
qazal cefc3e9382
make all schedules immutable [run_process_replay] (#6718)
* compute inputs and outputs in LBScheduleItem [run_process_replay]

* simpler metadata, delete __hash__

* no dynamic field

* test_diff_schedule
2024-09-24 21:08:16 +08:00
qazal 29330014ab
give FUZZ_SCHEDULE views a base (#6717)
* memoryview to bytes

* give FUZZ_SCHEDULE views a base
2024-09-24 19:20:37 +08:00
nimlgen f0019ad29c
bump ci test timeout for test_speed_exec_time (#6715)
* bump ci test timeout for test_speed_exec_time

* more
2024-09-24 18:44:09 +08:00
chenyu 8d75326cb5
do not fold var with min==max (#6713)
not really used, want it to keep as a var for valid simplification
[run_process_replay]
2024-09-24 06:16:34 -04:00
chenyu 9e51879019
fix idx setup in image_valid test_openpilot_conv3 (#6710)
* fix idx setup in image_valid test_openpilot_conv3

* corrected output and sad
2024-09-24 05:49:04 -04:00
wozeparrot 2be0b26a1f
rand only supports single device (#6682) 2024-09-24 16:07:44 +08:00
chenyu 4bb1694f49
more tests about bounds of UOp divs (#6700) 2024-09-24 00:41:43 -04:00
chenyu 79aef64d70
update tests in test_image_valid (#6698) 2024-09-24 00:04:21 -04:00
George Hotz 7c38121280
load penalty (#6681)
* bias/bn loads after loops

* load penalty in fix_priority

* more generic test
2024-09-23 18:12:12 +08:00
George Hotz 431ffc4254 hotfix: delete float16 failing 2024-09-23 17:42:57 +08:00
chenyu f55459c98e
failed validhack test for a 0.9.7 conv (#6677) 2024-09-23 04:43:47 -04:00
chenyu 0362dbbbe8
relax idx simplification given valid (#6669)
apply to kernels in op 0.9.7.
if a valid has a complicated expr, we cannot drop valid but it's possible to simplify idx given valid
2024-09-23 03:04:57 -04:00
chenyu 26ebb7cab4
don't use div_folding in lt_folding (#6666)
* don't use div_folding in lt_folding

valids 35 -> 13

* fails the same as before
2024-09-23 01:50:18 -04:00
chenyu da5b741656
removed valid in openpilot conv (#6619)
35 valids left
2024-09-23 00:30:18 -04:00
George Hotz 52c2c4df9c
fix match of sz 0 + dedup kernel ast [run_process_replay] (#6663)
* fix match of sz 0 [run_process_replay]

* empty graph rewrite to dedup st
2024-09-23 11:56:53 +08:00
chenyu 1923932339
canonicalize simplex lt (#6658)
(X := a0*x0 + a1*x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints
2024-09-22 23:04:47 -04:00
wozeparrot 46e360fdc0
check bfloat16 range with threefry (#6660) 2024-09-23 10:48:44 +08:00
qazal d24e4b1042
viz more kernel view work (#6659) 2024-09-23 10:48:35 +08:00
qazal 6be1bf09f1
hotfix: bring COMPARE_SCHEDULE=0 back (#6657) 2024-09-23 10:39:43 +08:00
George Hotz e945fa9c5c
put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]

* those are local too
2024-09-23 10:29:17 +08:00
chenyu 90c1ccc402
simpler drop valid check in simplify_valid_image_load (#6653)
* simpler drop valid check in simplify_valid_image_load

* update tests
2024-09-22 21:46:39 -04:00
qazal 6b65d8c461
more process replay tracing work [run_process_replay] (#6650) 2024-09-22 16:16:58 +08:00
qazal 5bafed2f88
process replay traceback (#6642) 2024-09-21 16:53:34 +08:00
qazal 982086f54c
UOps.VALID try 2 (#6623)
* make UOps.VALID compile

* fixable tests

* bufs dedup

* cleanup the CONST spec

* regenerate dataset with graph_rewrite

```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
  st: ShapeTracker = st_src.arg
  return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```

* rm arg

* remove arg

* revert arg removal

This reverts commit 2c35c75c950075d38c9fb8572f14640fe8235f74.

* red test_pickle_define_var
2024-09-21 14:19:25 +08:00
qazal d2351af019
fixup non-void SINKs in tests [run_process_replay] (#6624) 2024-09-21 13:29:18 +08:00
qazal 391d14438e
DEFINE_VAR prereqs for VALID [run_process_replay] (#6637) 2024-09-21 13:28:39 +08:00
nimlgen 053c4dee55
qcom test for image pitch (#6621)
* qcom test for image pitch

* comment
2024-09-20 18:13:48 +08:00
chenyu acef3e67fa
add an example that idx is const and valid cannot be removed (#6625)
very weird
2024-09-20 05:46:27 -04:00
chenyu 5707503048
x//a<b -> x <a*b for positive a (#6622)
openpilot valids 47 -> 37
2024-09-20 04:38:47 -04:00
chenyu b14c1bc417
UOps.RANGE is_increasing (#6615)
* UOps.RANGE is_increasing

283 -> 47 valids

* test
2024-09-20 03:14:52 -04:00
chenyu 036c2f5b26
validhack use the new style ge for upper bound valid (#6612)
also relaxed the bound check to check vmin/vmax instead just const.
valids 482 -> 283
2024-09-19 23:45:42 -04:00
chenyu a37e92081a
fix unrolled arange folding (#6606)
* fix unrolled arange folding

also added flop test to test_arange to make sure it's 0 flop

* skip PTX
2024-09-19 09:03:01 -04:00
George Hotz a1a882b006
arange folding with new ge (#6604)
* arange folding with new ge

* bump allowed gated

* bump allowed speed
2024-09-19 18:01:28 +08:00
chenyu d148a62f8d
more generic simplify_valid_image_load (#6603)
use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.
2024-09-19 05:33:37 -04:00
chenyu eeee032b14
tiny cleanup of test_image_valid (#6597)
* tiny cleanup of test_image_valid

Sepcial and Variable to setup UOp

* typo
2024-09-19 03:09:47 -04:00