-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MQTTS #24142
Comments
cc: @sanchit-gandhi and @ArthurZucker |
I think this is a cool model - whether it outperforms Bark (#24086) is up for debate. My only concerns are:
While the voice prompting feature would be cool and inference much faster than a hierarchical transformer model like Bark, I think the lack of visibility / excitement around the model means it would be a big effort to add with maybe little usage as a result cc @Vaibhavs10 who has had more experience with MQTTS, @ylacombe who's adding Bark and @hollance who's adding VITS MMS What do you all think? |
IMO for MQTTS - doesn't make as much sense, purely from a licensing standpoint. Plus it uses a non-standard quantizer, which makes it difficult to maintain (primarily because it'll be used only for MQTTS). I think a more ambitious idea would be to add tortoise-tts - https://github.com/neonbjb/tortoise-tts (Was released a while back but still is the king) - the original repo is not as optimised so with the transformers bells and whistles we can make sure that it works faster and better? Another idea would be to add StyleTTS - https://github.com/yl4579/StyleTTS, the results are quite promising and given there is training code as well, it opens up the opportunity to train a bigger model. |
Tortoise TTS would probably go in the Would you like to open a feature request for Tortoise TTS on the diffusers repo and tag myself and @Vaibhavs10? We can then discuss how feasible a new pipeline addition would be! |
thanks a lot for all the insights! Also I opened an issue for Tortoise TTS on the diffusers repo. It is here |
Perfect, thanks @susnato! Going to close this then since we're in agreement that MQTTS is not a good addition for transformers. Tortoise TTS issue in diffusers: huggingface/diffusers#3891 |
Model description
MQTTS is a Text to Speech model which was introduced in the paper A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. Their work explore the use of more abundant real-world data for building speech synthesizers. It's architecture is designed for multiple code generation and monotonic alignment, along with the use of a clean silence prompt to improve synthesis quality.They show that MQTTS outperforms existing TTS systems in several objective and subjective measures.
I would like to add this model to HF.
Open source status
Provide useful links for the implementation
Implementation - https://github.com/b04901014/MQTTS
Checkpoints -
The text was updated successfully, but these errors were encountered: