mirror of https://github.com/commaai/tinygrad.git
2fd7004980
* [WIP]: implementation of SoftVC VITS SVC model * fix typo * fix whitespace * Fully implement Generator & Synthesizer - implement SineGen & SourceHnNSF to reconstruct source signal from F0 - source signal is added during Generator - fix various typos - start loading state dict for synthesizer * Load Synthesizer weights - Fix typos in Synthesizer - Slightly modify vits::load_checkpoint to skip a specified layer - Test with Saul Goodman model because Drake weights are on mega * start work on ContentVec - implement ConvFeatureExtractionModel for ContentVec - start work on TransformerEncoder for ContentVec: - this transformer probably needs its own MultiheadAttention implementation - fix various typos in synthesizer - add helpers to mask behavior of ~ and % operator of torch * use normal and kaiming_normal * Implement ContentVec - load ContentVec weights and config from fairseq hyperparams - use MultiHeadAttention from whisper.py - TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing - redid tilde() - some cleanup * rename the file so it can be imported * forgot to lint * use float() instead of cast() * add contentvec256l9 and cleanup * Implement SoVITS fully and run it - Fully run sovits with .wav file - Drake weights need to be manually downloaded for now - Fix bugs - Add examples/sovits_helpers - Big TODO: INVALID Kernel for recordings > 4.5 secs * temp fix for longer audio recordings * Upsample no more torch * cleanup & detailed inference time measuring * Completely remove torch(audio) - Implement sinc resample in tinygrad - Load audio via Soundfile - Some cleanups * move stuff to helper files * Cleanup * fix invalid kernel * Cleanup & add more models * Metal sounds good after master merge - But Synthesizer pass became much slower * drake weights now marked save * do load/store in numpy * no commas needed here * remove extra newline * call Tensor::where on object * use Tensor::cat instead of numpy * pull out first iteration * remove Sequential, Dropout, GELU, TransposeLast * cast during loading * clean up attention * remove SamePad * Major cleanup / line reduction - Finish implementation of GroupNormMasked - Simplify parts of TransformerEncoder - Simplify parts of Generator - Move all helpers to common section - Only use repeat_expand_left for interp after SpeechEncoder - Moved SVC-specfic ContentVec impls up (canonically) - Proper annotations for get_encoder - Finished all TODOs - Squashed some whitespaces * clean up preprocess as well * more straightforward bool expr * add demo mode |
||
---|---|---|
.. | ||
preprocess.py |