Commit Graph

2972 Commits

Author SHA1 Message Date
qazal ab2d4d8d29
Fix cl import in the copy_speed test and cifar example (#2586)
* fix CL import

* update test to only run on GPU

* update hlb_cifar too
2023-12-03 09:22:07 -08:00
chenyu 3226b3d96b
enable the jit random test (#2580) 2023-12-02 20:25:23 -05:00
chenyu 09c9794f3f
clean external_test_opt.py (#2578) 2023-12-02 19:51:08 -05:00
George Hotz 171543fc8d
cleanups to save lines and files (#2577)
* runtime/graph -> features/graph

* put all the cstyle renderers in cstyle

* same line for those

* how did that pass mypy
2023-12-02 16:29:56 -08:00
George Hotz a9a76639c8
that's not needed (#2574) 2023-12-02 16:01:29 -08:00
chenyu 875c34bfc4
minor lazy tweak before rewrite (#2573) 2023-12-02 18:23:33 -05:00
qazal fa1d4dd14b
implement MAX in other dtypes (#2572) 2023-12-02 15:21:59 -08:00
nimlgen 065495e0c9
save a few lines in ops_gpu (#2564)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-02 15:05:22 -08:00
Oleg Rybalko 5e87083783
Whisper + LLAMA + VITS (#2332)
* feat: working voice 2 text using whisper

* feat: added llama generation

* feat: vits init

* feat: more accurate voice conversion

* feat: support for tts and working pipeline for the first pass

* fix: linter checks

* refactored vits initialization and inference, added mmts-tts support

* fixed process sync and now we can have an infinite conversation

* reuse output stream to remove overhead of creating a new one each time

* added pre-prompt configuration with yaml files

* adjusted code to merge PR which changed whisper

* optimized whisper, now it's blazing fast and also reduced number of lines

* added better debug printing

* use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens

* fixed hf convert and now it's working with tinyllama

* added tinyllama config

* refactored code and made it work with all llama models

* prettier order

* prettier order

* fixed suffix for tinyllama and refactored convert_from_hf

* added missing parameters

* fixed stream release and added missing params

* jitted dp and encoder

* jitted flow forward

* removed re-init of espeak on each call to save up time

* jitted generator forward for blazing fast tts

* added contextmanager for displaying a chat log

* removed whitespace for pylint

* updated code to support latest fetch func

* wait for llama eos token and pass params from cli to llama

* listen for not fixed amount of time

* refactored code a bit

* removed thresholding and now the output streams directly to whisper

* tokenize llama output for vits batch size to work and stream each sentence to a speaker

* changed speaker

* whisper is now printing on the same line

* don't trigger llama on whisper output in parens

* added tinyllama chat model

* adjusted code to work with tinyllama chat model

* removed unused cli arg

* autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer

* fixed issue with long sentences by chunking them

* support for multiline llama output

* prettified log output

* adjusted sentence length

* remove quote from response to avoid funny tts

* fixed prompts

* added missing parameter
2023-12-02 15:03:46 -08:00
qazal 47cec4caf3
int operations shouldn't have a fast math flag (#2571) 2023-12-02 14:53:36 -08:00
George Hotz d6b404ac11
No dtype alloc (#2570)
* fix all allocs

* improve docs

* ugh fix fake alloc
2023-12-02 13:29:40 -08:00
chenyu c8774713c5
lazy cleanup (#2567) 2023-12-02 13:21:43 -05:00
George Hotz 5068e99d18
refactor to remove extra kernel params (#2563)
* refactor to have compiled kernel

* bugfixes

* docs/beautiful.py

* revert that

* fix tests
2023-12-02 00:32:25 -08:00
George Hotz 27481b9206
Switch ops_gpu -> gpuctypes (#2532)
* ops_gpu is go

* fix size 0

* fix image, and add more tests

* nerf openpilot test, doesn't test thneed

* run the schedule

* better

* oops, new inputs

* delete pyopencl

* Update ops_gpu.py
2023-12-01 22:30:21 -08:00
qazal 99ee2ec37a
Refactor code_for_op to accept a dtype (#2555)
* update cstyle renderers to take a dtype in code_for_op

* implement NEG for bools in LLVM

* update triton

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-01 22:05:28 -08:00
George Hotz 82fd932921
Lower schedule 2 (#2561)
* ls2

* fix types

* simpler

* cleaner
2023-12-01 20:25:49 -08:00
George Hotz 217cda81ba hotfix: no metalgraph if there's weird ops 2023-12-01 19:32:55 -08:00
George Hotz 6733425095
lower schedule (#2559)
* lower schedule

* remove RAND, and don't put load in the JIT yet

* better fix for that test
2023-12-01 19:17:46 -08:00
Christopher Mauri Milan 077567f62d
Remove as_buffer for TORCH (#2554)
* remove as_buffer for torch

* enable torch zerocopy if on cpu

* remove as_buffer even on torch:cpu
2023-12-01 18:51:38 -08:00
chenyu 05a5357dd9
fix handcode_resnet50_opt.py (#2558) 2023-12-01 20:51:21 -05:00
chenyu 86fbd413f3
update test_real_world configs (#2557) 2023-12-01 20:03:52 -05:00
andresgit 00523d5656
New fix accessing elements created by padding (#2529)
* pad slice test cases, many failing

* fix failing test cases

check mask if we are outside the base buffer
also create a multi-view if in that case we reshape to an empty shape

* real_offset calculation more readable

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-01 19:08:10 -05:00
George Hotz bfdce1f0e7 hotfix: make openpilot test deterministic 2023-12-01 15:37:23 -08:00
George Hotz 9c306be282 better name for fast path 2023-12-01 15:32:47 -08:00
chenyu 67f4e03724
rewrite 0 size loadop into a CONST (#2556)
* rewrite 0 size loadop into a CONST

* check alloc size

* EMPTY is better

* Revert "EMPTY is better"

This reverts commit 574fe0f9ed28f1b97da5a81afdfd2cd5d9a94ff9.

* no ast is created

* fix test
2023-12-01 18:29:06 -05:00
George Hotz 4447188051 gate METAL_FAST_LOAD 2023-12-01 15:28:40 -08:00
chenyu e9426f4fe4
simpler get_contraction (#2552)
* simpler get_contraction

* and test
2023-12-01 18:02:52 -05:00
George Hotz eb595588bb device.py cleanups for -1 line 2023-12-01 14:59:33 -08:00
George Hotz f9b1de598f hotfix: metal fastpath on sonoma 2023-12-01 14:55:34 -08:00
George Hotz f5de21e753
fast path for copy (#2548)
* fast copy

* ruff first

* flat_mv on malloc

* order + webgpu test
2023-12-01 11:34:47 -08:00
wozeparrot 28183c7438
feat: reword (#2549) 2023-12-01 10:56:18 -08:00
George Hotz 4c984bba7e
bump version to 0.8.0, clean CI, remove requests (#2545)
* bump version to 0.8.0, clean CI, remove requests

* why was that even there
2023-12-01 10:42:50 -08:00
nimlgen ff47be3a01
ruff check whitespaces (#2547) 2023-12-01 10:42:20 -08:00
George Hotz 8fd8399437
remove flake8 (#2544) 2023-12-01 09:48:41 -08:00
George Hotz d8175a4380
simple fix (#2543) 2023-12-01 09:42:15 -08:00
qazal 04483f8187
refactor llvm consts (#2537) 2023-12-01 09:39:40 -08:00
nimlgen badc97f824
hip & cuda to gpuctypes (#2539)
* cuda with gpuctypes

* hip gpuctypes

* graphs

* rename + linter happy

* use cpu_time_execution

* no ji in build_kernel_node_params

* remove hip_wrapper

* hip fix

* no arc

* smalle changes

* no clean moduke in cudacpu
2023-12-01 09:25:27 -08:00
qazal 0fb4ff30c8
share duplicate renders with cstyle (#2538) 2023-12-01 08:10:36 -08:00
chenyu 7fec966b5e
bye bye NOOP (#2534)
* bye bye NOOP

* SIN

* NEG
2023-11-30 23:10:35 -08:00
Joe Donovan fa549d198d
Remove `type: ignore` comments (#2533)
* remove some type ignore comments and fix errors

* remove unnecessary get_args import

* revert triton changes

* remove changes not in tinygrad
2023-11-30 22:15:55 -08:00
George Hotz 12fa846122
zero copy (#2531)
* zero copy

* zero copy test

* loads coder in milliseconds

* zero copy for cpu and torch

* src_from_buffer is None

* SLOW_METAL_COPY there
2023-11-30 18:38:41 -08:00
Matthias Kronberg 5394a05b9d
Fix: Get item from ndarray before casting to int (#2525)
Directly casting is deprecated and will error in the future.
2023-11-30 18:34:31 -08:00
George Hotz 2c363b5f0b
new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
chenyu e56511b59a
more type annotation for tensor and lazy (#2528)
* more type annotation for tensor and lazy

* don't need that
2023-11-30 17:50:22 -05:00
Davi Silva ddeec24fa8
Cleanup & fix llama.py (#2524)
* docs, cleanup crap

* comma AI

* fix 70B

* this is why lexical scope exists
2023-11-30 16:00:17 -05:00
chenyu 7d26452305
call ruff with --preview (#2522)
some checks are ignored without --preview
2023-11-30 13:59:00 -05:00
chenyu 5db0cdfbd3
support list of ints (or other Tensorable) in tensor indices (#2520)
* support list of ints (or other Tensorable) in tensor indices

* enable some index test cases
2023-11-30 12:46:33 -05:00
chenyu bd941a0df1
first version of test_indexing (#2515)
* first version of test_indexing

* move to test/imported
2023-11-30 00:03:59 -05:00
chenyu d210f6a786
minor device.py cleanups (#2510) 2023-11-29 18:16:25 -05:00
qazal 370cfbb957
Cleanup vectorized hip renders (#2497)
* add typedefs and make_dtypen functions

use ext_vector_type for half16 kernels

* remove the old test_render because we just use whatever cstyle has

* align vectors
2023-11-29 14:02:12 -08:00