Commit Graph

2630 Commits

Author SHA1 Message Date
wozeparrot 9eb6eef441
seed in tensor (#6869) 2024-10-06 14:46:58 -04:00
jeffzh4ng 19a7e41113
implement logcumsumexp (#6921)
* implement logcumsumexp

* change axis=None to axis=0
2024-10-06 10:45:36 -04:00
chenyu 75d9dcf000
support dtype in softmax and log_softmax (#6914)
matches torch. for mixed precision training, we would want to use float for softmax
2024-10-06 07:18:15 -04:00
chenyu 08414d7b7c
cleanup test_uop_symbolic.py (#6894)
no more test_symbolic for reference, so force expected output to be exact instead of a set
2024-10-04 20:53:10 -04:00
ignaciosica 555bcb5e54
static access for code_for_op (#6889) 2024-10-05 07:38:01 +08:00
vladov 5f6b6162b3
Suppress warnings in transcendental tests. (#6891) 2024-10-05 07:37:17 +08:00
George Hotz 4df5c7a4ef
move lazy to engine [pr] (#6886)
* move lazy to engine [pr]

* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz 6b063450df
move hcq device to runtime [pr] (#6879)
* things that are only used in one place don't belong in helpers [pr]

* start moving hcq device [pr]

* fix paths
2024-10-04 22:26:50 +08:00
George Hotz 8ca506ee37
remove the magic methods for moving between devices [pr] (#6881)
* remove the magic methods for moving between devices [pr]

* remove unneeded clang
2024-10-04 20:27:52 +08:00
George Hotz a0cb16ac61
node cleanup + local metal test speed [pr] (#6880)
* node cleanup [pr]

* fix tests, including the double one on metal

* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz cdff1d75b6
things that are only used in one place don't belong in helpers [pr] (#6878)
* things that are only used in one place don't belong in helpers [pr]

* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz f4ec39fe58
switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630316487509980af20c6d2981de00bec.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
George Hotz 738a5794a9
last update for new symbolic [pr] (#6877) 2024-10-04 14:58:51 +08:00
qazal 17068410e6
give EXT schedules metadata [pr] (#6865) 2024-10-03 20:14:18 +08:00
George Hotz e10245909a
explore global uop cache [pr] (#6863)
* explore global uop cache

* wvd uops

* remove useless lru caches

* key is is

* simpler rewriter
2024-10-03 13:08:13 +08:00
chenyu c3c93f332a
symbolic bool raise ValueError when not sure [pr] (#6853) 2024-10-02 09:10:58 -04:00
George Hotz 7214450c23
little symbolic changes [pr] (#6849)
* little symbolic changes [pr]

* symbolic needs resolve too

* no resolve

* less change
2024-10-02 17:12:30 +08:00
George Hotz be12409b51
changes for symbolic (#6844)
* changes for symbolic

* only for ints

* check int first
2024-10-02 12:57:16 +08:00
George Hotz 100ce7a684 hotfix: min/max on CMPNE was wrong 2024-10-02 10:15:03 +08:00
George Hotz 1ac83aaa4b
lil sym changes (#6837)
* lil sym changes [pr]

* fix inf crap

* Update ops.py

* remove that, it's wrong
2024-10-02 09:54:17 +08:00
George Hotz 84726e8855
good changes from symbolic removal [run_process_replay] (#6835)
* good changes from symbolic removal [run_process_replay]

* fix __ne__
2024-10-01 18:49:09 +08:00
qazal c5b252cdb3
add pr alias [pr] (#6834) 2024-10-01 18:48:44 +08:00
George Hotz e907b25792
move some pm rules to uopgraph.py [run_process_replay] (#6831)
* move some pm rules to uopgraph.py [run_process_replay]

* move more

* move lt and clean

* end maybe

* put back
2024-10-01 18:28:41 +08:00
vladov 501cfde7e6
Fix GPT2 with OpenCL backend. (#6821)
* Fix GPT2 with OpenCL backend.

* Add test for unaligned copies into OpenCL buffers.
2024-10-01 16:57:22 +08:00
qazal a16a8c5958
color process replay stats [run_process_replay] (#6830) 2024-10-01 15:29:11 +08:00
George Hotz 547733e57c
stunning_mnist [run_process_replay] (#6828)
* stunning_mnist [run_process_replay]

* add loss to stunning mnist
2024-10-01 15:00:48 +08:00
qazal 391497a311
schedule independent of Device [run_process_replay] (#6829) 2024-10-01 14:46:26 +08:00
George Hotz 8a93c48901
pickle main pattern matcher [run_process_replay] (#6827)
* pickle main pattern matcher [run_process_replay]

* del line
2024-10-01 13:58:42 +08:00
George Hotz d726eb6f48
uop resolve [run_process_replay] (#6826)
* uop bool and int and stuff [run_process_replay]

* add ne support

* can't even be None anymore

* BinaryOps.AND support

* less compare
2024-10-01 13:11:42 +08:00
George Hotz 50dd6bd951
move cmp tuple out [run_process_replay] (#6825)
* move cmp tuple out [run_process_replay]

* was unneeded
2024-10-01 10:38:28 +08:00
George Hotz 0f28e93224
add pickle support for pattern matchers [run_process_replay] (#6816)
* add pickle support for pattern matchers [run_process_replay]

* cleaner and all

* no closures

* fix tests

* revert that

* final

* cleaner

* python 3.8 fix

* add round trip back

* this

* waste lines on this. that's the final line count

* max print better

* more targetted fix

* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
qazal 0c24fec9f4
test current behavior of const schedule [run_process_replay] (#6817) 2024-09-30 21:02:01 +08:00
qazal 4a4aa69b84
add a better dedup test for DEFINE_VAR with CONST arg (#6813) 2024-09-30 15:43:55 +08:00
qazal e7fcbe1a4d
refactor test_linearizer correctness asserts (#6812) 2024-09-30 15:31:02 +08:00
George Hotz 9dd9f71011
no global kernel stuff [run_process_replay] (#6808)
* use traceback instead of global metadata crap [run_process_replay]

* save the kernel

* correct, imports clean, no device

* UNPARENTED

* speed

* proudly unparented

* Update ops.py

* update tests for unparented

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-30 13:52:33 +08:00
qazal 2ec73d6f05
push swizzle through dim change (#6801)
* push swizzle through dim change

* can this be generic

* generic version

* cleanups
2024-09-30 09:04:59 +08:00
qazal dab05ff070
match dataclass.replace in UOp.replace [run_process_replay] (#6792)
* UOp replace matching dataclass replace

* p2

* replace creates a copy
2024-09-28 16:28:49 +08:00
George Hotz eaa1e0eeeb
rename constant_folder to sym [run_process_replay] (#6780) 2024-09-27 14:54:54 +08:00
George Hotz c178dc1071
faster uops ci [run_process_replay] (#6774) 2024-09-26 20:15:01 +08:00
nimlgen 3c56aeee70
add Tensor.from_blob (#6765)
* draft tensor from pointer init

* some docs and types

* comment

* cleaner

* test

* malloc

* qcom cl interop

* jit example

* cleaner

* dealoc

* wording

* docs
2024-09-26 18:33:19 +08:00
George Hotz 14ad47b515
rewrite to use uops if (#6764)
* rewrite to use uops if

* does this pass

* careful penalty

* fix tests

* remove unused stuff

* that's a cstyle rewrite

* Update test_linearizer_dumb.py
2024-09-26 18:09:09 +08:00
wozeparrot 2b899164c6
no numpy (#6751) 2024-09-26 16:40:18 +08:00
qazal ee4feedb77
delete test_variable_const [run_process_replay] (#6757)
* delete test_variable_const [run_process_replay]

* don't allow variable UPat
2024-09-26 12:27:11 +08:00
George Hotz b199b699ed
use shl everywhere (#6744)
* use shl everywhere

* fix parens

* late patterns

* works as an extra pass

* ptx
2024-09-26 09:59:36 +08:00
qazal 12e4a4900a
hotfix: missing return in METAL dm benchmark (#6749) 2024-09-26 09:12:38 +08:00
qazal 8a15ccb414
start gc/mem usage tests for buffer schedule [run_process_replay] (#6737)
* gc tests for buffer schedule [run_process_replay]

* assert global counters, maybe del

* check init

* rm global counters
2024-09-26 08:26:31 +08:00
qazal b629a7998d
early assert buffer count limit [run_process_replay] (#6746)
* better error message for buffer count limit [run_process_replay]

* 3.9 needs that

* assert ScheduleItem

* new _test_buf_cnt
2024-09-26 08:24:26 +08:00
wozeparrot c100f3d406
default threefry (#6116) 2024-09-25 17:45:13 +08:00
mesozoic-egg 992cde05d7
Metal with CDLL instead of py-objc (#6545)
* Add CDLL interface for metal

* remove two unused functions

* Cover most of the API methods

* switch to cdll

* directly call objc message in ops_metal

* keep only obj interface

* Use direct message sending for graph

* may have found a solution to the memoryview on ctypes pointer

* buf indexing bug fixed

* fix c_int

* fix c int to bytes

* fix gpu time bug

* line savings for cdll metal core

* wip

* c int bug

* fix buf casting

* dedup for c_void_p

* dedup for c_void_p

* linter fix

* remove unused stuff

* my py fix

* more mypy error fix

* line savings

* line savings

* rename send_message to msg; add __hash__ and __eq__ for dedup

* wip

* refactor

* refactor

* remove named import from ctypes

* forgot to change variable name

* file reorg, put support.py to ops_metal

* refactor

* hash error

* remove to_ns_array

* test oom exception, fix exception change

* typevar for msg

* add back dedup

* test for compile error

* move constant to graph

* move header constant around

* get label for icb buffer

* check icb label using "in"

* wip fixing mypy reported error

* fixed mypy error

* code formatting

* all_resources dedup match previous

* code formatting

* code formatting; buffer set to objc_id

* revert changes on buf for the manual release, seems like _free is not always called

* skip unless on metal, for test_metal

* fix premature mem release causing seg fault

* test_metal check for device before importing

* Buffer should only be released under _free explicitly

* mypy fixes

* change object ownership

* test compile success

* lint fixes

* remove load_library

* wrap sel_register in cache

* simplify to_struct

* swap lines

* fix type error in to_struct

* bump line to 9800

* remove pyobjc from setup.py

* command buffer should be objc_instance and get released

* stringWithUTF8String: returns objc_instance

* Use constant for MTLPipelineOptionNone

* better explanation for [MTLBuffer contents:] return

* Use dyld_find in case the path differs

* trailing whitespace

* handle exception for methods that take error:

* load /System/Library instead of /Library

* Init c_void_p with None instead of zero for error objects

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-25 17:43:01 +08:00
George Hotz cd534dee11
cstyle changes that don't pass process replay (#6734)
* cstyle changes that don't pass process replay

* add constant folder back there

* cleanups

* const

* fix some tests

* bfloat16 too

* complete set of types

* that cast shouldn't be needed

* that was a questionable test
2024-09-25 17:33:34 +08:00