tinygrad

Commit Graph

Author	SHA1	Message	Date
chenyu	e614b7c696	docs: showcase remove mnist_gan and add conversation.py (#4757 ) fixed both examples, and i think it's better to show conversation	2024-05-28 11:09:26 -04:00
chenyu	792a494eb8	fix various examples (#4691 ) * fix examples that used ax1 and ax2 for transpose * fix that * update those	2024-05-22 20:43:21 -04:00
chenyu	3f126c7664	fix examples vits / converstion.py (#4239 ) it was passing a const numpy array into Tensor.arange	2024-04-20 23:29:12 -04:00
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00
chenyu	fcf4a5ccf2	fix example that calls Tensor.__bool__ (#3650 ) also removed `.cpu()` calls in mask_rcnn so `python3 examples/mlperf/model_spec.py` runs	2024-03-07 16:59:26 -05:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
Oleg Rybalko	7c427d738c	don't apply padding on script call (#2585 ) * don't apply padding on script call * no need for new param because batch_size value can be utilized to check * fixed argument naming	2023-12-05 16:34:10 -08:00
Oleg Rybalko	5e87083783	Whisper + LLAMA + VITS (#2332 ) * feat: working voice 2 text using whisper * feat: added llama generation * feat: vits init * feat: more accurate voice conversion * feat: support for tts and working pipeline for the first pass * fix: linter checks * refactored vits initialization and inference, added mmts-tts support * fixed process sync and now we can have an infinite conversation * reuse output stream to remove overhead of creating a new one each time * added pre-prompt configuration with yaml files * adjusted code to merge PR which changed whisper * optimized whisper, now it's blazing fast and also reduced number of lines * added better debug printing * use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens * fixed hf convert and now it's working with tinyllama * added tinyllama config * refactored code and made it work with all llama models * prettier order * prettier order * fixed suffix for tinyllama and refactored convert_from_hf * added missing parameters * fixed stream release and added missing params * jitted dp and encoder * jitted flow forward * removed re-init of espeak on each call to save up time * jitted generator forward for blazing fast tts * added contextmanager for displaying a chat log * removed whitespace for pylint * updated code to support latest fetch func * wait for llama eos token and pass params from cli to llama * listen for not fixed amount of time * refactored code a bit * removed thresholding and now the output streams directly to whisper * tokenize llama output for vits batch size to work and stream each sentence to a speaker * changed speaker * whisper is now printing on the same line * don't trigger llama on whisper output in parens * added tinyllama chat model * adjusted code to work with tinyllama chat model * removed unused cli arg * autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer * fixed issue with long sentences by chunking them * support for multiline llama output * prettified log output * adjusted sentence length * remove quote from response to avoid funny tts * fixed prompts * added missing parameter	2023-12-02 15:03:46 -08:00
Francis Lata	7169de57e2	Update VITS to use fetch helper (#2422 ) * use fetch helper on vits * remove duplicate weight loading	2023-11-24 08:50:03 -08:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
JaSpa99	2fd7004980	Implementation of SoftVC VITS SVC model (#1371 ) * [WIP]: implementation of SoftVC VITS SVC model * fix typo * fix whitespace * Fully implement Generator & Synthesizer - implement SineGen & SourceHnNSF to reconstruct source signal from F0 - source signal is added during Generator - fix various typos - start loading state dict for synthesizer * Load Synthesizer weights - Fix typos in Synthesizer - Slightly modify vits::load_checkpoint to skip a specified layer - Test with Saul Goodman model because Drake weights are on mega * start work on ContentVec - implement ConvFeatureExtractionModel for ContentVec - start work on TransformerEncoder for ContentVec: - this transformer probably needs its own MultiheadAttention implementation - fix various typos in synthesizer - add helpers to mask behavior of ~ and % operator of torch * use normal and kaiming_normal * Implement ContentVec - load ContentVec weights and config from fairseq hyperparams - use MultiHeadAttention from whisper.py - TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing - redid tilde() - some cleanup * rename the file so it can be imported * forgot to lint * use float() instead of cast() * add contentvec256l9 and cleanup * Implement SoVITS fully and run it - Fully run sovits with .wav file - Drake weights need to be manually downloaded for now - Fix bugs - Add examples/sovits_helpers - Big TODO: INVALID Kernel for recordings > 4.5 secs * temp fix for longer audio recordings * Upsample no more torch * cleanup & detailed inference time measuring * Completely remove torch(audio) - Implement sinc resample in tinygrad - Load audio via Soundfile - Some cleanups * move stuff to helper files * Cleanup * fix invalid kernel * Cleanup & add more models * Metal sounds good after master merge - But Synthesizer pass became much slower * drake weights now marked save * do load/store in numpy * no commas needed here * remove extra newline * call Tensor::where on object * use Tensor::cat instead of numpy * pull out first iteration * remove Sequential, Dropout, GELU, TransposeLast * cast during loading * clean up attention * remove SamePad * Major cleanup / line reduction - Finish implementation of GroupNormMasked - Simplify parts of TransformerEncoder - Simplify parts of Generator - Move all helpers to common section - Only use repeat_expand_left for interp after SpeechEncoder - Moved SVC-specfic ContentVec impls up (canonically) - Proper annotations for get_encoder - Finished all TODOs - Squashed some whitespaces * clean up preprocess as well * more straightforward bool expr * add demo mode	2023-08-13 19:43:23 -07:00
Felix	97a6029cf7	Corrected a few misspelled words (#1435 )	2023-08-04 16:51:08 -07:00
Stan	0a3d4f8103	Implementation of VITS TTS model (#1188 ) * [WIP]: implementation of VITS TTS model * Implemented VITS model, moved all code to examples/vits.py * Added support for vctk model, auto download, and cleanups * Invoke tensor.realize() before measuring inference time * Added support for mmts-tts model, extracted TextMapper class, cleanups * Removed IPY dep, added argument parser, cleanups * Tiny fixes to wav writing * Simplified the code in a few places, set diff log level for some prints * Some refactoring, added support for uma_trilingual model (anime girls) * Fixed bug where embeddings are loaded with same backing tensor, oops * Added emotional embed support, added cjks + voistock models - voistock is multilingual model with over 2k anime characters - cjks is multilingual model with 24 speakers both are kinda bad for english though :c * Removed `Tensor.Training=False` (not needed and wrong oop) * Changed default model and speaker to vctk with speaker 6 * Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy * Removed accidentally pushed test/spline.py * Some slight refactors * Replaced masked_fill with tensor.where * Added y_length estimating, plus installation instructions, plus some cleanups * Fix overestimation log message. * Changed default value of `--estimate_max_y_length` to False This is only useful for larger inputs. * Removed printing of the phonemes * Changed default value of `--text_to_synthesize`	2023-07-20 17:37:14 -07:00

15 Commits