Commit Graph

2786 Commits

Author SHA1 Message Date
George Hotz 70a65c201e
JIT support in Interpreted (#2314)
* factor that out

* jit is supported everywhere

* fix some tests

* there's no jit supported device, the jit is everywhere

* fix test uops
2023-11-15 11:13:38 -08:00
chenyu 9a20bc08d6
Tensor(None) is Tensor([]) (#2316) 2023-11-15 13:49:18 -05:00
chenyu f1f863c953
allow 0-dim array to broadcast into zero shape tensor (#2315)
* allow 0-dim array to broadcast into zero shape tensor

* not in
2023-11-15 13:12:21 -05:00
George Hotz 4da2ddea6e
Interpreted cleanups (#2312)
* move the compiler out of ops

* don't return realized

* var_vals filter, fix custom

* typing
2023-11-15 09:02:23 -08:00
chenyu 123a0b86b2
support zero in shape (#2303)
* zero in shape start

* no assert for that

* if output size is 0, return without exec

* tweak

* strides

* reduce over non-zero

* shrink and expand

* fix import

* test_elementwise where

* cannot reshape from size 0 to size 1

* compiled backend reduce over 0

* zeros for numpy

* reduce over 0 and keepdim resulted in 1

* reduce empty set default values

* compare with same input

* pad test case

* cat test case

* torch does not support that?
2023-11-15 11:57:48 -05:00
qazal f113a0b83b
dtype promotion priorities (#2311) 2023-11-15 07:19:52 -08:00
geohotstan 3c5a51fb3a
aaaaaaa finally (#2310) 2023-11-15 07:12:38 -08:00
kormann cff8375aa2
make self referential AST fast too (#2278)
* cleanup

* linter

* linter

* linter

* rm .buffers

* linter

* linter

* huh?

* cleanup

* typo

* min diff

* property

* rev

* linter

* no matel hack

* minimal properties

* line

* checkout master

* copy_to_device

* idk

* revert

* type

* type

* faast

* speed test

* cleanup test

* softer test

* monotonic

* harder test

* clean code

* cleanup
2023-11-15 07:12:07 -08:00
George Hotz 4f7b1ac0d2
cleanups before interpreted jit (#2306)
* jit mnist

* InterpretedFlopCounter doesn't rely on Interpreted

* allocator for cpu and torch

* types for exec_ast

* fix type issues

* fix onnx, remove print

* always self.from_underlying
2023-11-14 21:44:25 -08:00
mmmkkaaayy 91546225f4
Add cache step for model weights in CI, re-enable whisper test (#2307) 2023-11-14 21:16:04 -08:00
chenyu 175cdbe815
fix pad None will value (#2308) 2023-11-14 23:57:05 -05:00
George Hotz 01f8781c26
fix CI (#2300)
* might work

* might work 2

* might work 3

* sneak that in to llama too

* pin them all
2023-11-14 11:02:59 -08:00
nimlgen 4e0d47533e
beam works with var vals (#2296)
* beam works with var vals

* test passes now

* better comment

* linter happy
2023-11-14 13:03:19 -05:00
chenyu fac8633ba8
explicit opts for test_linearizer_failures (#2299)
* explicit opts for test_linearizer_failures

* typo

* update the invalid check
2023-11-14 11:52:38 -05:00
George Hotz 8916028ddd
move BatchExecutor (#2297)
* move BatchExecutor

* refactor to get_optimized_program

* that changed
2023-11-14 08:08:51 -08:00
George Hotz 0cbf6c1811
move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz b1f7f29525
metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu d86ea188dd
support symbolic shape in Interpreted (#2289)
* support symbolic shape in Interpreted

* simpler

* no InterpretedFlopCounter

* tragic NumNode

* regex is hard
2023-11-13 20:13:18 -05:00
George Hotz 6960bcded0
back to 6.54GB for stable diffusion (#2288)
* back to 6.54GB for stable diffusion

* cleanups

* only outputs, not inputs

* err, restore hack for world
2023-11-13 16:50:04 -08:00
nimlgen 960535dfb8
get_linearizer_actions does not return illegal actions (#2287)
* fix some linearizer failures

* linter happy

* no new test class
2023-11-13 11:48:54 -05:00
rodfer 53c5baa8b6
add dilation to avg_pool2d (#2270)
* add dilation to avg_pool2d

* avg_pool_fix

* avg_pool_fix

* woo

* oops

* force it correct

---------

Co-authored-by: rodfer0x80 <rodfer0x80@proton.me>
Co-authored-by: zibokapi <zibokapi@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-13 08:47:56 -08:00
chenyu a72b370066
llama take int and convert to Variable internally (#2284) 2023-11-12 17:11:37 -05:00
valar 123ea051e6
refactor/ci: delete many `# type: ignore` (#2281)
* refactor/ci: delete many `# type: ignore`

* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag

refs #2240

* ci: move `--warn-unused-ignores` flag to mypy config

refs #2240
2023-11-12 11:04:20 -08:00
George Hotz 2e2154ae4f bad hotfix for optimize_local_size, try again 2023-11-12 10:41:11 -08:00
George Hotz 270f747065 hotfix optimize_local_size (TODO: add regression test) 2023-11-12 10:29:00 -08:00
chenyu f5a62a1b42
fix some tests related to JitItem (#2279) 2023-11-11 23:00:35 -05:00
chenyu 5ef8d682e3
clean up attentions in stable diffusion (#2275) 2023-11-11 14:25:36 -05:00
chenyu 453f48ce02
pad None means (0,0) (#2273) 2023-11-11 09:50:26 -08:00
jxdv c5d70c1871
typo (#2271) 2023-11-11 07:18:04 -08:00
chenyu 880e693207
fix llama n_kv_heads in kvcache (#2267)
* fix llama n_kv_heads in kvcache

* trigger ci
2023-11-10 21:44:39 -05:00
George Hotz 78623ba204 two simple tests 2023-11-10 16:16:06 -08:00
George Hotz 70fb8a259d hotfix mypy 2023-11-10 15:43:30 -08:00
George Hotz 6ceea02e65 hotfix of onnx 2023-11-10 15:40:30 -08:00
geohotstan b853e9bb8c
Onnx 1.15.0 gogogo (#2217)
* lol

* lol

* add GELULULULUL

* onnx 1.50

* fuk torch bool neg

* exclude regex tests

* exclude dequantizelinear for now

* is sunny in philly

* damn it affinegrid

* fixed auto_pad VALID

* skip 0 shape tests

* add temporary cast in Reduces

* tests should pass now

* added comments and cleanup

* try moving dequantizelinear to onnx.py

* fixed dequantizedlinear?

* cleanup

* try?

* float16 segfaults LLVM CI..???

* cleanup comments

* pin to 1.50.0

* remove use of -np.inf cuz numpy is kill

* 1.50? lol I'm actually retarded

* thx for review, muhbad

* moved Gelu higher up
2023-11-10 15:36:48 -08:00
George Hotz 85d26ddc36
uops loop removal (#2262)
* remove the loop

* cleanups

* tests failing still

* global_loop_ctx wasn't needed

* replace_op is cleaner

* minor opt

* cast opt was wrong

* uop_num

* uop num was dumb

* tuplize_uops

* torch tests

* fix test_uops
2023-11-10 15:24:47 -08:00
chenyu a753c8e071
examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
qazal b6aaf12df7
Internal cast 2 with more tests (#2257)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51.

* Revert "render phi as the dtype"

This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711.

* reenable triton tests

* no vstore_half if dtype is already half

* upcast max
2023-11-10 10:42:39 -08:00
George Hotz c0f447d6f7
Inline barrier (#2255)
* put barrier inline for locals

* fix pre-commit on m3

* gate if through barrier
2023-11-10 08:17:10 -08:00
chenyu 75f6e9ab54
one more fuzz linearizer failed example (#2260) 2023-11-10 09:17:37 -05:00
George Hotz 330484c072
Revert "Internal casting support (#2046)" (#2256)
This reverts commit 7e1d08b2ae.
2023-11-09 21:27:13 -08:00
qazal 7e1d08b2ae
Internal casting support (#2046)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3350d74314806f00f5aaacb075bdf51.

* Revert "render phi as the dtype"

This reverts commit d08cb270b42266f06e4a78b199f9937cb9dc4711.

* reenable triton tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-09 21:02:32 -08:00
vish-pr 6051f0ce82
For cuda get current free space from device, and retry alloc failures (#2197)
* For cuda get current free space from device, and rery alloc failures

* type ignore for mypy

* add init to get free mem in cuda

* Move retry logic in common lib.

Fix typo in override _get_cur_free_space

* linter error fix in test file

* Not catch all, as it will catch KeyboardInterrupt

* fix unintened line changes
2023-11-09 15:53:50 -08:00
qazal 2465d5d267
fix ops tests in test_dtype (#2237)
* fix test ops

* decompose the err from test_ops

* skipTest skips the entire test, we dont want that

* handle cases with the same priority

* add int16 to torch map
2023-11-09 15:17:43 -08:00
George Hotz 80bf0b8586
proper wmma (#2245)
* proper wmma

* hip cast

* bugfixes

* bugfix

* that bug is fixed

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2023-11-09 15:15:18 -08:00
wozeparrot b7a31fb708
remove tokei badge from readme (#2251) 2023-11-09 13:53:31 -05:00
2-5 50bf0703aa
fix sqlite cache path on Windows (#2250) 2023-11-09 10:32:34 -08:00
chenyu 10d642e174
fuzz linearizer transformation (#2188)
* fuzz linearizer transformation

* no standard normal for fp16

* work

* Interpreted start

* CPU and TORCH work

* fix MemBuffer with same idx

* id for failed kernels

* no image and variable for Interpreted

* symbolic shape

* IMAGE only for GPU

* Interpreted almost all good

* cleanup

* fix bufs_from_lin

* zero size

* some failed examples

* just Exception

* just test not pass
2023-11-09 08:03:27 -08:00
chenyu 794122781d
Merge pull request #2242 from chenyuxyz/mypy-casts
mypy check warn_redundant_casts
2023-11-08 20:04:46 -05:00
George Hotz 38b7f5a7fd
less phi, proper phi (#2241)
* less phi, proper phi

* disable flaky whisper test
2023-11-08 16:13:43 -08:00
chenyu b9fe133af8 mypy check warn_redundant_casts 2023-11-08 15:06:55 -08:00