* Add CDLL interface for metal
* remove two unused functions
* Cover most of the API methods
* switch to cdll
* directly call objc message in ops_metal
* keep only obj interface
* Use direct message sending for graph
* may have found a solution to the memoryview on ctypes pointer
* buf indexing bug fixed
* fix c_int
* fix c int to bytes
* fix gpu time bug
* line savings for cdll metal core
* wip
* c int bug
* fix buf casting
* dedup for c_void_p
* dedup for c_void_p
* linter fix
* remove unused stuff
* my py fix
* more mypy error fix
* line savings
* line savings
* rename send_message to msg; add __hash__ and __eq__ for dedup
* wip
* refactor
* refactor
* remove named import from ctypes
* forgot to change variable name
* file reorg, put support.py to ops_metal
* refactor
* hash error
* remove to_ns_array
* test oom exception, fix exception change
* typevar for msg
* add back dedup
* test for compile error
* move constant to graph
* move header constant around
* get label for icb buffer
* check icb label using "in"
* wip fixing mypy reported error
* fixed mypy error
* code formatting
* all_resources dedup match previous
* code formatting
* code formatting; buffer set to objc_id
* revert changes on buf for the manual release, seems like _free is not always called
* skip unless on metal, for test_metal
* fix premature mem release causing seg fault
* test_metal check for device before importing
* Buffer should only be released under _free explicitly
* mypy fixes
* change object ownership
* test compile success
* lint fixes
* remove load_library
* wrap sel_register in cache
* simplify to_struct
* swap lines
* fix type error in to_struct
* bump line to 9800
* remove pyobjc from setup.py
* command buffer should be objc_instance and get released
* stringWithUTF8String: returns objc_instance
* Use constant for MTLPipelineOptionNone
* better explanation for [MTLBuffer contents:] return
* Use dyld_find in case the path differs
* trailing whitespace
* handle exception for methods that take error:
* load /System/Library instead of /Library
* Init c_void_p with None instead of zero for error objects
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* tqdm replacement almost
* formatting
* formatting
* imports
* line len
* fix
* removed set description :(
* removed set description :(
* fix
* fix
* green check?
* rewrote as class, fixed several bugs
* types spacing
* removed imports
* fix
* iterable
* typing
* mypy disagreement
* imports
* more e2e tests vs tqdm
* removed seed setting
* robustness against time.sleep() flakiness
* flaky fix
* automatic bar closing when count==total
* cleanup
* clang error with tqdm
* tqdm back
* use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program
* back to shutil
* unit_scale + unit_scale test
* custom unit to tests
* pretty
* clean
* removed flaky test
* less test iters
* empty line
* remove disable
* add support for train/val datasets for kits19
* split dataset into train and val sets
* add tests for kits19 dataloader
* add MLPerf dataset tests to CI
* update unet3d model_eval script
* fix linting
* add nibabel
* fix how mock dataset gets created
* update ref implementation with permalink and no edits
* clean up test and update rand_flip implementation
* cleanups
* lars optimizer + tests
* fix skip list!
* use id to compare in skip list
* go back to using set
* Tensor(bool) * Tensor(bool) is and
* don't lint external/mlperf_resnet
* whitespace
* add external_test_optim to opencl tests
* give mlperf task a name
* mlperf under onnx
* remove track_gnorm
* contiguous instead of realize
* assert momentum and weight decay positive
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* move gpuctypes in tree
* fix mypy
* regex exclude
* autogen sh
* mypy exclude
* does that fix it
* fix mypy
* add hip confirm
* verify all autogens
* build clang2py
* opencl headers
* gpu on 22.04
* dtypes alu test
* those types don't exist in torch
* floats
* more tests
* disable those
* a couple unary tests
* skip float16 tests in CI for GPU
* fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM
* remove hardcoded float for LLVM ALU fns
* less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32.
* return on overflows
* fix CUDA exp2
* compute results of op regardless of bounds in a python backend
* skip fp16 in GPU and CUDACPU
* fuzz a smaller range in the float_midcast_int32 test
I sampled this and we overflow ~70% of the time.
because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here
* remove CUDA exp2 overload it's already there now
---------
Co-authored-by: George Hotz <geohot@gmail.com>
* ops_gpu is go
* fix size 0
* fix image, and add more tests
* nerf openpilot test, doesn't test thneed
* run the schedule
* better
* oops, new inputs
* delete pyopencl
* Update ops_gpu.py
* cuda with gpuctypes
* hip gpuctypes
* graphs
* rename + linter happy
* use cpu_time_execution
* no ji in build_kernel_node_params
* remove hip_wrapper
* hip fix
* no arc
* smalle changes
* no clean moduke in cudacpu
* add name support
* use fetch in gpt2
* remove requests from main lib, networkx also optional
* umm, keep that assert
* updates to fetch
* i love the walrus so much
* stop bundling mnist with tinygrad
* err, https
* download cache names
* add DOWNLOAD_CACHE_VERSION
* need env.
* ugh, wrong path
* replace get_child