* Add CDLL interface for metal
* remove two unused functions
* Cover most of the API methods
* switch to cdll
* directly call objc message in ops_metal
* keep only obj interface
* Use direct message sending for graph
* may have found a solution to the memoryview on ctypes pointer
* buf indexing bug fixed
* fix c_int
* fix c int to bytes
* fix gpu time bug
* line savings for cdll metal core
* wip
* c int bug
* fix buf casting
* dedup for c_void_p
* dedup for c_void_p
* linter fix
* remove unused stuff
* my py fix
* more mypy error fix
* line savings
* line savings
* rename send_message to msg; add __hash__ and __eq__ for dedup
* wip
* refactor
* refactor
* remove named import from ctypes
* forgot to change variable name
* file reorg, put support.py to ops_metal
* refactor
* hash error
* remove to_ns_array
* test oom exception, fix exception change
* typevar for msg
* add back dedup
* test for compile error
* move constant to graph
* move header constant around
* get label for icb buffer
* check icb label using "in"
* wip fixing mypy reported error
* fixed mypy error
* code formatting
* all_resources dedup match previous
* code formatting
* code formatting; buffer set to objc_id
* revert changes on buf for the manual release, seems like _free is not always called
* skip unless on metal, for test_metal
* fix premature mem release causing seg fault
* test_metal check for device before importing
* Buffer should only be released under _free explicitly
* mypy fixes
* change object ownership
* test compile success
* lint fixes
* remove load_library
* wrap sel_register in cache
* simplify to_struct
* swap lines
* fix type error in to_struct
* bump line to 9800
* remove pyobjc from setup.py
* command buffer should be objc_instance and get released
* stringWithUTF8String: returns objc_instance
* Use constant for MTLPipelineOptionNone
* better explanation for [MTLBuffer contents:] return
* Use dyld_find in case the path differs
* trailing whitespace
* handle exception for methods that take error:
* load /System/Library instead of /Library
* Init c_void_p with None instead of zero for error objects
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* cstyle changes that don't pass process replay
* add constant folder back there
* cleanups
* const
* fix some tests
* bfloat16 too
* complete set of types
* that cast shouldn't be needed
* that was a questionable test
* real minimum cstyle change
* make it match
* bring back DEFINE_GLOBAL store marking writable
* bump line count to 9800
* closer
* precompute don't render
* cast/bitcast too
* smem_align
* vectorize
* more pr match
* remove that test
* less PR diff
* cstyle changes that [run_process_replay]
* real minimum cstyle change
* make it match
* bring back DEFINE_GLOBAL store marking writable
* bump line count to 9800
* closer
* precompute don't render
* cast/bitcast too
* smem_align
* vectorize
* more pr match
* remove that test
* less PR diff
* advanced setitem draft
* add setitem tests
* fix for tests
* small change
* handle repeated indices with test
* fix v broadcasting to mask
* clean up a bit
* open more tests
* clean up, fixes issue with scalar tensor index
* fix
* fix index_put_ and linter
* add type annotation
* done
* remove non contiguous hack
* woops linter
* name fix
* add back type notation
* more type notation
* final
* linter
* check lazydata not shared
* no numpy
* no numpy
* rename
* index benchmark
* linter
* no cloning time
* rm benchmark
* new function
* rm contiguous and cast early
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>