tinygrad

Commit Graph

Author	SHA1	Message	Date
Francis Lata	7169de57e2	Update VITS to use fetch helper (#2422 ) * use fetch helper on vits * remove duplicate weight loading	2023-11-24 08:50:03 -08:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
JaSpa99	2fd7004980	Implementation of SoftVC VITS SVC model (#1371 ) * [WIP]: implementation of SoftVC VITS SVC model * fix typo * fix whitespace * Fully implement Generator & Synthesizer - implement SineGen & SourceHnNSF to reconstruct source signal from F0 - source signal is added during Generator - fix various typos - start loading state dict for synthesizer * Load Synthesizer weights - Fix typos in Synthesizer - Slightly modify vits::load_checkpoint to skip a specified layer - Test with Saul Goodman model because Drake weights are on mega * start work on ContentVec - implement ConvFeatureExtractionModel for ContentVec - start work on TransformerEncoder for ContentVec: - this transformer probably needs its own MultiheadAttention implementation - fix various typos in synthesizer - add helpers to mask behavior of ~ and % operator of torch * use normal and kaiming_normal * Implement ContentVec - load ContentVec weights and config from fairseq hyperparams - use MultiHeadAttention from whisper.py - TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing - redid tilde() - some cleanup * rename the file so it can be imported * forgot to lint * use float() instead of cast() * add contentvec256l9 and cleanup * Implement SoVITS fully and run it - Fully run sovits with .wav file - Drake weights need to be manually downloaded for now - Fix bugs - Add examples/sovits_helpers - Big TODO: INVALID Kernel for recordings > 4.5 secs * temp fix for longer audio recordings * Upsample no more torch * cleanup & detailed inference time measuring * Completely remove torch(audio) - Implement sinc resample in tinygrad - Load audio via Soundfile - Some cleanups * move stuff to helper files * Cleanup * fix invalid kernel * Cleanup & add more models * Metal sounds good after master merge - But Synthesizer pass became much slower * drake weights now marked save * do load/store in numpy * no commas needed here * remove extra newline * call Tensor::where on object * use Tensor::cat instead of numpy * pull out first iteration * remove Sequential, Dropout, GELU, TransposeLast * cast during loading * clean up attention * remove SamePad * Major cleanup / line reduction - Finish implementation of GroupNormMasked - Simplify parts of TransformerEncoder - Simplify parts of Generator - Move all helpers to common section - Only use repeat_expand_left for interp after SpeechEncoder - Moved SVC-specfic ContentVec impls up (canonically) - Proper annotations for get_encoder - Finished all TODOs - Squashed some whitespaces * clean up preprocess as well * more straightforward bool expr * add demo mode	2023-08-13 19:43:23 -07:00
Felix	97a6029cf7	Corrected a few misspelled words (#1435 )	2023-08-04 16:51:08 -07:00
Stan	0a3d4f8103	Implementation of VITS TTS model (#1188 ) * [WIP]: implementation of VITS TTS model * Implemented VITS model, moved all code to examples/vits.py * Added support for vctk model, auto download, and cleanups * Invoke tensor.realize() before measuring inference time * Added support for mmts-tts model, extracted TextMapper class, cleanups * Removed IPY dep, added argument parser, cleanups * Tiny fixes to wav writing * Simplified the code in a few places, set diff log level for some prints * Some refactoring, added support for uma_trilingual model (anime girls) * Fixed bug where embeddings are loaded with same backing tensor, oops * Added emotional embed support, added cjks + voistock models - voistock is multilingual model with over 2k anime characters - cjks is multilingual model with 24 speakers both are kinda bad for english though :c * Removed `Tensor.Training=False` (not needed and wrong oop) * Changed default model and speaker to vctk with speaker 6 * Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy * Removed accidentally pushed test/spline.py * Some slight refactors * Replaced masked_fill with tensor.where * Added y_length estimating, plus installation instructions, plus some cleanups * Fix overestimation log message. * Changed default value of `--estimate_max_y_length` to False This is only useful for larger inputs. * Removed printing of the phonemes * Changed default value of `--text_to_synthesize`	2023-07-20 17:37:14 -07:00

6 Commits