* feat: working voice 2 text using whisper
* feat: added llama generation
* feat: vits init
* feat: more accurate voice conversion
* feat: support for tts and working pipeline for the first pass
* fix: linter checks
* refactored vits initialization and inference, added mmts-tts support
* fixed process sync and now we can have an infinite conversation
* reuse output stream to remove overhead of creating a new one each time
* added pre-prompt configuration with yaml files
* adjusted code to merge PR which changed whisper
* optimized whisper, now it's blazing fast and also reduced number of lines
* added better debug printing
* use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens
* fixed hf convert and now it's working with tinyllama
* added tinyllama config
* refactored code and made it work with all llama models
* prettier order
* prettier order
* fixed suffix for tinyllama and refactored convert_from_hf
* added missing parameters
* fixed stream release and added missing params
* jitted dp and encoder
* jitted flow forward
* removed re-init of espeak on each call to save up time
* jitted generator forward for blazing fast tts
* added contextmanager for displaying a chat log
* removed whitespace for pylint
* updated code to support latest fetch func
* wait for llama eos token and pass params from cli to llama
* listen for not fixed amount of time
* refactored code a bit
* removed thresholding and now the output streams directly to whisper
* tokenize llama output for vits batch size to work and stream each sentence to a speaker
* changed speaker
* whisper is now printing on the same line
* don't trigger llama on whisper output in parens
* added tinyllama chat model
* adjusted code to work with tinyllama chat model
* removed unused cli arg
* autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer
* fixed issue with long sentences by chunking them
* support for multiline llama output
* prettified log output
* adjusted sentence length
* remove quote from response to avoid funny tts
* fixed prompts
* added missing parameter
* ops_gpu is go
* fix size 0
* fix image, and add more tests
* nerf openpilot test, doesn't test thneed
* run the schedule
* better
* oops, new inputs
* delete pyopencl
* Update ops_gpu.py
* update cstyle renderers to take a dtype in code_for_op
* implement NEG for bools in LLVM
* update triton
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* pad slice test cases, many failing
* fix failing test cases
check mask if we are outside the base buffer
also create a multi-view if in that case we reshape to an empty shape
* real_offset calculation more readable
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* rewrite 0 size loadop into a CONST
* check alloc size
* EMPTY is better
* Revert "EMPTY is better"
This reverts commit 574fe0f9ed28f1b97da5a81afdfd2cd5d9a94ff9.
* no ast is created
* fix test
* cuda with gpuctypes
* hip gpuctypes
* graphs
* rename + linter happy
* use cpu_time_execution
* no ji in build_kernel_node_params
* remove hip_wrapper
* hip fix
* no arc
* smalle changes
* no clean moduke in cudacpu
* cpu tests pass
* torch works
* works
* metal works
* fix ops_disk
* metal jit works
* fix openpilot
* llvm and clang work
* fix webgpu
* docs are rly broken
* LRU works on metal
* delete comment
* revert name to ._buf. LRU only on Compiled
* changes
* allocator
* allocator, getting closer
* lru alloc
* LRUAllocator
* all pass
* metal
* cuda
* test examples
* linearizer
* test fixes
* fix custom + clean realize
* fix hip
* skip tests
* fix tests
* fix size=0
* fix MOCKHIP
* fix thneed
* copy better
* simple
* old style metal copy
* fix thneed
* np reshape
* give cuda a device
* add typedefs and make_dtypen functions
use ext_vector_type for half16 kernels
* remove the old test_render because we just use whatever cstyle has
* align vectors