Commit Graph

3006 Commits

Author SHA1 Message Date
chenyu 8e22c0d95c
everything can jit now (#2338) 2023-11-16 23:54:57 -05:00
Friedrich Carl Eichenroth a8875bd770
add types to lazy (#2327)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 20:48:41 -08:00
George Hotz 1d5501594e
force rebuild of ocelot (#2334)
* force rebuild of ocelot

* SzymonOzog gpuocelot

* delete that

* downgrade that

* non parallel

* force rebuild

* use llvm

* nauto

* less mem maybe

* print test

* helper_test_exception skip CUDACPU

* helper_test_exception

* shippable
2023-11-16 20:44:14 -08:00
imaolo 0d0c74bac9
Assert for memory allocation failures (#2337)
* assert adequate memory has been freed

* cleaned up runtime error message

* improved metal buffer alloc error catching and reporting

* decreased lines and altered messages

* removed unnecessary  _get_cur_free_space() call

* improved assert message

* added allocate massive buffer test

* added test_lru_allocator_metal_max_buffer_length

* split into two asserts and removed walrus assignment from assert expression

* update assert message and use byte data type for clarity
2023-11-16 20:14:16 -08:00
chenyu aa01a63b3f
cleanup of lines / unused / types (#2336) 2023-11-16 21:15:32 -05:00
chenyu 3971259832
fix test_real_world llama (#2335) 2023-11-16 19:50:08 -05:00
chenyu 3b9dd3330c
add device to beam search cache key (#2333) 2023-11-16 18:35:08 -05:00
Friedrich Carl Eichenroth 75676ab8e1
Profiling-helper (#2321)
* change profiler

* remove unused imports

* remove unused imports

* change lazybuffer references

* remove unused line

* remove unused import

* remove unused stuff

* add types

* typing

* typing

* typing

* trigger actions

* -1 loc

* fixup

* trigger actions

* revert lazy typing changes

* WIP profiler helper

* replace old start & stop profiler

* fixup

* linting

* Update llama.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 14:15:56 -08:00
mmmkkaaayy 8235da11dd
whisper: support batch inference, add librispeech WER test (#2074)
* whisper: support batch inference, add librispeech WER test, add kv caching and JIT

* remove JIT_SUPPORTED_DEVICE

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 13:50:08 -08:00
George Hotz 3baaf298d6
two stage cumsum in tensor.py (#2331)
* two stage cumsum in tensor.py

* 2 more kernels for llama cumsum

* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu 163b2bc26a
wgpu.utils._device -> wgpu.utils.device (#2330)
* wgpu.utils._device -> wgpu.utils.device

* can i do this?

* no need to specify metal
2023-11-16 12:52:13 -05:00
chenyu 27f4c26312
fix getitem slice when end < start (#2329) 2023-11-16 11:20:27 -05:00
chenyu 822d6e6f18
Simpler mops verify (#2325)
* rewrite the to_movement_ops check using symbolic

* tweak
2023-11-15 21:47:18 -05:00
George Hotz ef67d7ff5d shapetracker whitespace 2023-11-15 15:24:09 -08:00
chenyu a98511561c
fuzz_linearizer same api for interpreted and compiled (#2320) 2023-11-15 17:40:22 -05:00
George Hotz 294e71de15
remove lines (unused code) (#2319)
* remove lines

* uhh, i'm tired

* that function never worked

* types for ast_parse
2023-11-15 14:36:11 -08:00
George Hotz 628365eab6
JIT cleanups (#2317)
* cleanup cleanup

* dedup update_stats
2023-11-15 13:34:52 -08:00
forcefieldsovereign b64738e1d6
Remove AS_STRIDED from shapetracker (#2216)
* very close

* remove comment

* negative strides working

* almost everything passes

* calculate offset with list comprehension

* some cleanup

* got disk load working

* review suggestions

* fix after merge

* overlap working

* did it

* clean

* fixed disk load

* lint

* mypy

* removed as_strided

* trying without simplify

* added back simplify

* make sure expanding to smaller shape

* cleanup

* removed comment

* removed env file

* trying whisper test again

* onnx test sqlite issue

* working on test

* finished test

* eliminate unnecessary shrink-then-pad

* don't shrink buffer

* added strides check

* added to ci under linters

* switch issue

* allow symbolic stride

* removed .env

* isinstance

* adjust strides for double expand

* cleanup

* needed to add type hint for mypy

* set pythonpath
2023-11-15 15:50:17 -05:00
Marcello Fuschi b8d460d203
Add Tensor.multinomial (#2295)
* add Tensor.multinomial only with replacement

* add support for 2D input in Tensor.multinomial

* fix multinomial output shape

* allow passing replacement=False to Tensor.multinomial when num_samples=1

* improve tests for Tensor.multinomial

* fix edge case in Tensor.multinomial

* Tensor.multinomial no more staticmethod
2023-11-15 11:38:39 -08:00
taher cb6cfcc8f8
add icb support check for metal device (#2313) 2023-11-15 11:37:28 -08:00
George Hotz 70a65c201e
JIT support in Interpreted (#2314)
* factor that out

* jit is supported everywhere

* fix some tests

* there's no jit supported device, the jit is everywhere

* fix test uops
2023-11-15 11:13:38 -08:00
chenyu 9a20bc08d6
Tensor(None) is Tensor([]) (#2316) 2023-11-15 13:49:18 -05:00
chenyu f1f863c953
allow 0-dim array to broadcast into zero shape tensor (#2315)
* allow 0-dim array to broadcast into zero shape tensor

* not in
2023-11-15 13:12:21 -05:00
George Hotz 4da2ddea6e
Interpreted cleanups (#2312)
* move the compiler out of ops

* don't return realized

* var_vals filter, fix custom

* typing
2023-11-15 09:02:23 -08:00
chenyu 123a0b86b2
support zero in shape (#2303)
* zero in shape start

* no assert for that

* if output size is 0, return without exec

* tweak

* strides

* reduce over non-zero

* shrink and expand

* fix import

* test_elementwise where

* cannot reshape from size 0 to size 1

* compiled backend reduce over 0

* zeros for numpy

* reduce over 0 and keepdim resulted in 1

* reduce empty set default values

* compare with same input

* pad test case

* cat test case

* torch does not support that?
2023-11-15 11:57:48 -05:00
qazal f113a0b83b
dtype promotion priorities (#2311) 2023-11-15 07:19:52 -08:00
geohotstan 3c5a51fb3a
aaaaaaa finally (#2310) 2023-11-15 07:12:38 -08:00
kormann cff8375aa2
make self referential AST fast too (#2278)
* cleanup

* linter

* linter

* linter

* rm .buffers

* linter

* linter

* huh?

* cleanup

* typo

* min diff

* property

* rev

* linter

* no matel hack

* minimal properties

* line

* checkout master

* copy_to_device

* idk

* revert

* type

* type

* faast

* speed test

* cleanup test

* softer test

* monotonic

* harder test

* clean code

* cleanup
2023-11-15 07:12:07 -08:00
George Hotz 4f7b1ac0d2
cleanups before interpreted jit (#2306)
* jit mnist

* InterpretedFlopCounter doesn't rely on Interpreted

* allocator for cpu and torch

* types for exec_ast

* fix type issues

* fix onnx, remove print

* always self.from_underlying
2023-11-14 21:44:25 -08:00
mmmkkaaayy 91546225f4
Add cache step for model weights in CI, re-enable whisper test (#2307) 2023-11-14 21:16:04 -08:00
chenyu 175cdbe815
fix pad None will value (#2308) 2023-11-14 23:57:05 -05:00
George Hotz 01f8781c26
fix CI (#2300)
* might work

* might work 2

* might work 3

* sneak that in to llama too

* pin them all
2023-11-14 11:02:59 -08:00
nimlgen 4e0d47533e
beam works with var vals (#2296)
* beam works with var vals

* test passes now

* better comment

* linter happy
2023-11-14 13:03:19 -05:00
chenyu fac8633ba8
explicit opts for test_linearizer_failures (#2299)
* explicit opts for test_linearizer_failures

* typo

* update the invalid check
2023-11-14 11:52:38 -05:00
George Hotz 8916028ddd
move BatchExecutor (#2297)
* move BatchExecutor

* refactor to get_optimized_program

* that changed
2023-11-14 08:08:51 -08:00
George Hotz 0cbf6c1811
move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz b1f7f29525
metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu d86ea188dd
support symbolic shape in Interpreted (#2289)
* support symbolic shape in Interpreted

* simpler

* no InterpretedFlopCounter

* tragic NumNode

* regex is hard
2023-11-13 20:13:18 -05:00
George Hotz 6960bcded0
back to 6.54GB for stable diffusion (#2288)
* back to 6.54GB for stable diffusion

* cleanups

* only outputs, not inputs

* err, restore hack for world
2023-11-13 16:50:04 -08:00
nimlgen 960535dfb8
get_linearizer_actions does not return illegal actions (#2287)
* fix some linearizer failures

* linter happy

* no new test class
2023-11-13 11:48:54 -05:00
rodfer 53c5baa8b6
add dilation to avg_pool2d (#2270)
* add dilation to avg_pool2d

* avg_pool_fix

* avg_pool_fix

* woo

* oops

* force it correct

---------

Co-authored-by: rodfer0x80 <rodfer0x80@proton.me>
Co-authored-by: zibokapi <zibokapi@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-13 08:47:56 -08:00
chenyu a72b370066
llama take int and convert to Variable internally (#2284) 2023-11-12 17:11:37 -05:00
valar 123ea051e6
refactor/ci: delete many `# type: ignore` (#2281)
* refactor/ci: delete many `# type: ignore`

* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag

refs #2240

* ci: move `--warn-unused-ignores` flag to mypy config

refs #2240
2023-11-12 11:04:20 -08:00
George Hotz 2e2154ae4f bad hotfix for optimize_local_size, try again 2023-11-12 10:41:11 -08:00
George Hotz 270f747065 hotfix optimize_local_size (TODO: add regression test) 2023-11-12 10:29:00 -08:00
chenyu f5a62a1b42
fix some tests related to JitItem (#2279) 2023-11-11 23:00:35 -05:00
chenyu 5ef8d682e3
clean up attentions in stable diffusion (#2275) 2023-11-11 14:25:36 -05:00
chenyu 453f48ce02
pad None means (0,0) (#2273) 2023-11-11 09:50:26 -08:00
jxdv c5d70c1871
typo (#2271) 2023-11-11 07:18:04 -08:00
chenyu 880e693207
fix llama n_kv_heads in kvcache (#2267)
* fix llama n_kv_heads in kvcache

* trigger ci
2023-11-10 21:44:39 -05:00