* zero in shape start
* no assert for that
* if output size is 0, return without exec
* tweak
* strides
* reduce over non-zero
* shrink and expand
* fix import
* test_elementwise where
* cannot reshape from size 0 to size 1
* compiled backend reduce over 0
* zeros for numpy
* reduce over 0 and keepdim resulted in 1
* reduce empty set default values
* compare with same input
* pad test case
* cat test case
* torch does not support that?
* metal indirect command buffers
* sub 1ms gpt
* metal batch exec is good
* remove whitespace
* input_replace
* fix ci
* useResources
* very simple cacheallocator
* update_stats
* fix CI
* minor
* remove that from jit
* refactor/ci: delete many `# type: ignore`
* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag
refs #2240
* ci: move `--warn-unused-ignores` flag to mypy config
refs #2240
* var_vals are global
* working with global ish
* better
* fix export model
* fix tests
* better kv cache
* does it run?
* use where for kvmask
* fix excessive var_vals
* fix import
* how does multigpu use this?
* llama kinda work
* faster and simpler
* cleanup
* fix conversation mode
* test cleanups
* fix one more test
* test cleanup
---------
Co-authored-by: George Hotz <geohot@gmail.com>
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51.
* Revert "render phi as the dtype"
This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711.
* reenable triton tests
* no vstore_half if dtype is already half
* upcast max
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51.
* Revert "render phi as the dtype"
This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711.
* reenable triton tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* For cuda get current free space from device, and rery alloc failures
* type ignore for mypy
* add init to get free mem in cuda
* Move retry logic in common lib.
Fix typo in override _get_cur_free_space
* linter error fix in test file
* Not catch all, as it will catch KeyboardInterrupt
* fix unintened line changes
* fix test ops
* decompose the err from test_ops
* skipTest skips the entire test, we dont want that
* handle cases with the same priority
* add int16 to torch map
* fuzz linearizer transformation
* no standard normal for fp16
* work
* Interpreted start
* CPU and TORCH work
* fix MemBuffer with same idx
* id for failed kernels
* no image and variable for Interpreted
* symbolic shape
* IMAGE only for GPU
* Interpreted almost all good
* cleanup
* fix bufs_from_lin
* zero size
* some failed examples
* just Exception
* just test not pass