Commit Graph

48 Commits

Author SHA1 Message Date
George Hotz ded1b38b84
minor dtype cleanup [pr] (#7124)
* minor dtype cleanup [pr]

* use ptr() function
2024-10-17 17:41:23 +08:00
qazal 20d3c2d113
unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955)
* add UOps.VIEW

* update hardcoded asts

* update sops.gz
2024-10-09 02:00:17 +08:00
wozeparrot c100f3d406
default threefry (#6116) 2024-09-25 17:45:13 +08:00
George Hotz bdd0c06f29
add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
qazal 442150a8df
more ast_const for hardcoding consts [run_process_replay] (#6418) 2024-09-09 11:35:08 +08:00
gswangg 1dc6040877
migrate test_search.py to UOp AST (#6245)
* add imports and update test_kernel_count with UOp AST

* test_filter_global_buffer

* remove LazyOp

* remove extra.ops and ReduceOps

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-24 16:13:53 +03:00
qazal d1d41130cd
use membufs in ImageDType checks [run_process_replay] (#6136)
* use membufs in ImageDType checks

* set by key [run_process_replay]
2024-08-17 16:17:46 +03:00
qazal 28c75bf2a6
merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
qazal c23d44c779
AST is UOp (#6030)
* most of the work from the uops2 branch

* schedule

* realize

* kernel

* lowerer

* search

* green

* merge uops with ops

* Revert "merge uops with ops"

This reverts commit 1408a59f12c97e3466679884266b247cf9df46bc.

* fix benchmark

* remove extra dedup
2024-08-16 22:09:00 +03:00
qazal 4d38fec8c1
rename lazyops to parents [run_process_replay] (#6091) 2024-08-15 17:27:32 +03:00
George Hotz b399ccd6ef
BEAM bugfix, kernels dedup now (#5617)
* BEAM bugfix, kernels dedup now

* getenv is default
2024-07-20 19:43:50 -07:00
George Hotz fa7e734b49
MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
chenyu 28972418c4
s/get_linearizer/get_kernel [run_process_replay] (#5467) 2024-07-13 20:32:22 -04:00
George Hotz 03c2dc8bd7
lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00
George Hotz 870dc8c350
s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
George Hotz 6707c778d0
scheduleitem is not Tuple [run_process_replay] (#5425)
* scheduleitem is not Tuple [run_process_replay]

* fix tests

* fix op + fuzzers

* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz f6ef283e6a
s/loadops/metaops [run_process_replay] (#5421) 2024-07-12 13:26:50 -07:00
qazal 28bf8d86d8
test_linearizer with multi output ASTs (#5115)
* ast is tuple

* run test_phi_simplification

* update reason

* more tc

* beam

* a few more

* use test_opt directly
2024-06-23 15:41:24 +03:00
George Hotz 9f875123b6
small changes from lowerer. [run_process_replay] [no_assert] (#5102) 2024-06-22 11:09:35 -07:00
nimlgen fd071ba27e
amd mockgpu correct timer resolution (#4942)
* amd mockgpu correct timer resolution

* test it
2024-06-13 10:07:34 +03:00
qazal 8b5bcf309a
process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
qazal f64fa51a64
process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
chenyu 456aa0b656
update test_search kernel count (#4652)
integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer
2024-05-19 13:54:52 -04:00
Léo 967e35f8b8
fix(beam): GlobalCounters kernel count increasing when clearing l2 (#4598)
* fix(beam): GlobalCounters kernel count increasing when clearing l2

* fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize()

* test(search): regression test for _time_program, should not increment kernel_count

* fix(test_search): unused var and now properly checking when l2 is cleared

* fix(test_search): added assert message

* fix(test_search): now testing public beam api for kcount

* ruff fixes

---------

Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>
2024-05-19 10:03:47 -07:00
nimlgen daf57af3eb
move tc to renderers (#4631)
* move tc to renderers

* missed import

* fix typo

* fix

* fix imports

* remove from tests

* fix 4607

* nv emulate timestamp

* time is int

* correct time
2024-05-18 00:36:29 +03:00
chenyu c86adabe15
time with real global buffers in search (#4621)
* filter fake buffers in search

* test that

* update test
2024-05-17 12:36:23 -04:00
nimlgen eb9689336e
nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
George Hotz ff64bcab69
move graph/search to engine (#4596) 2024-05-14 23:12:59 -07:00
George Hotz 1e843d495e
cleaning up search with Program (#4500)
* cleaning up search

* fix tests

* test fix

* minor compiler cleanup
2024-05-09 19:01:53 -07:00
George Hotz c9e84ed0da
refactor to Program class (#4476)
* refactor to Program class

* switch to Program

* fix tests

* smaller diff

* self.p

* more tests

* fix metal test

* tests

* fix openpilot

* move that to linearizer

* p.launchdims
2024-05-09 17:29:07 -07:00
Francis Lam 5c5b40880f
search: fix edge cases on screening potential ops (#4394)
* search: fix edge cases on screening potential ops

won't change correctness, but will save a little python time by
properly deduplicating potential actions

* check for de-duplication instead of exact valid actions

* refactor long line
2024-05-02 14:53:05 -04:00
George Hotz acf4ba5c9f
method cache respects beam option (#4261)
* method cache respects beam option

* cleanup get_runner
2024-04-23 09:00:41 +04:00
George Hotz 9eef44521b
ScheduleItem uses Buffer (#3995)
* schedule Buffer

* update

* update tests

* master

* works

* remove LoadOps.WAIT

* fix compile2

* bad test

* rename and note
2024-03-29 20:50:27 -07:00
George Hotz 42b9d999ea
Buffer isn't always allocated (#3974)
* buffer alloc

* allocate

* missing allocates

* last one
2024-03-28 13:33:47 -07:00
George Hotz 68ca4d4276
split to schedule.py (#3949)
* split to schedule.py

* split
2024-03-26 21:02:46 -07:00
George Hotz 150ea2eb76
create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
qazal 337cd53444
multioutput ScheduleItem (#3699)
* refactor realize.py

* update docs

* update test_sched

* update runners and devices

* update openpilot and unit tests

* cleanup runner lowering

* update more tests
2024-03-13 08:59:38 -07:00
chenyu 906cc3a69b
cleanup tests Device[Device.DEFAULT] is always Compiled (#3645) 2024-03-07 11:15:42 -05:00
Jovan Sardinha 8978488565
add sanity tests for bufs_from_lin (#3586) 2024-03-02 14:17:43 -08:00
George Hotz 2e60012bcf
move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz 655c6f61d3
St real size (#3046)
* track the size in the lazybuffer

* shapetracker real size

* lint
2024-01-08 14:44:53 -08:00
George Hotz c003be7309
Revert "track size in shapetracker" (#3043)
* Revert "track size in shapetracker (#3026)"

This reverts commit a8ba1ac08f.

* st.size
2024-01-08 13:13:39 -08:00
George Hotz a8ba1ac08f
track size in shapetracker (#3026)
* track size in shapetracker

* shapetracker adapter

* size is an int

* create Buffer with st.size

* only compare the views for the jit

* fix webgpu
2024-01-05 20:15:53 -08:00
George Hotz 2c363b5f0b
new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
George Hotz 9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz f17bc16f46
simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz 7103b716c4
merge kernel and optimizer (#2200)
* merge kernel and optimizer

* linearize is reentrant

* move global/local size

* clean up linearizer copy

* remove unneeded lin copies

* stop linearizing twice

* oops, that should be None
2023-11-01 15:20:01 -07:00
qazal 36d4001b4f
add test coverage for search (#2104)
* add test coverage for search

* only in compiled backends

* dont use device.default in decorator

* time_til is the other way around xd
2023-10-19 17:06:47 -07:00