Add MQTTS #24142

susnato · 2023-06-09T13:47:28Z

Model description

MQTTS is a Text to Speech model which was introduced in the paper A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. Their work explore the use of more abundant real-world data for building speech synthesizers. It's architecture is designed for multiple code generation and monotonic alignment, along with the use of a clean silence prompt to improve synthesis quality.They show that MQTTS outperforms existing TTS systems in several objective and subjective measures.

I would like to add this model to HF.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Implementation - https://github.com/b04901014/MQTTS
Checkpoints -

Config - https://cmu.box.com/s/hvv06w3yr8mob4csjjaigu5szq2qcjab
Quantize - https://cmu.box.com/s/966rcxkyjps80p7thu0r6lo2udk1ezdm
Transformer model - https://cmu.box.com/s/xuen9o8wxsmyaz32a65fu25cz92a2jei

susnato · 2023-06-09T13:49:48Z

cc: @sanchit-gandhi and @ArthurZucker

sanchit-gandhi · 2023-06-12T16:48:10Z

I think this is a cool model - whether it outperforms Bark (#24086) is up for debate. My only concerns are:

The NC license which is not super permissive
The low-visibility of the original repo: with only 130 GH stars, it seems like the community is not super excited by the model (and thus are unlikely to use it in the library)

While the voice prompting feature would be cool and inference much faster than a hierarchical transformer model like Bark, I think the lack of visibility / excitement around the model means it would be a big effort to add with maybe little usage as a result

cc @Vaibhavs10 who has had more experience with MQTTS, @ylacombe who's adding Bark and @hollance who's adding VITS MMS

What do you all think?

Vaibhavs10 · 2023-06-28T10:10:34Z

IMO for MQTTS - doesn't make as much sense, purely from a licensing standpoint. Plus it uses a non-standard quantizer, which makes it difficult to maintain (primarily because it'll be used only for MQTTS).

I think a more ambitious idea would be to add tortoise-tts - https://github.com/neonbjb/tortoise-tts (Was released a while back but still is the king) - the original repo is not as optimised so with the transformers bells and whistles we can make sure that it works faster and better?

Another idea would be to add StyleTTS - https://github.com/yl4579/StyleTTS, the results are quite promising and given there is training code as well, it opens up the opportunity to train a bigger model.

sanchit-gandhi · 2023-06-28T10:24:51Z

Tortoise TTS would probably go in the diffusers repo (since we could build it as a diffusion pipeline with a transformer encoder) - since the purpose of diffusers is more pure performance (which is not the objective of transformers) it would be a good fit here

Would you like to open a feature request for Tortoise TTS on the diffusers repo and tag myself and @Vaibhavs10? We can then discuss how feasible a new pipeline addition would be!

susnato · 2023-06-28T12:38:05Z

thanks a lot for all the insights!

Also I opened an issue for Tortoise TTS on the diffusers repo. It is here

sanchit-gandhi · 2023-06-30T17:15:03Z

Perfect, thanks @susnato! Going to close this then since we're in agreement that MQTTS is not a good addition for transformers. Tortoise TTS issue in diffusers: huggingface/diffusers#3891

susnato added the New model label Jun 9, 2023

sanchit-gandhi closed this as completed Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MQTTS #24142

Add MQTTS #24142

susnato commented Jun 9, 2023

susnato commented Jun 9, 2023

sanchit-gandhi commented Jun 12, 2023

Vaibhavs10 commented Jun 28, 2023

sanchit-gandhi commented Jun 28, 2023 •

edited

Loading

susnato commented Jun 28, 2023 •

edited

Loading

sanchit-gandhi commented Jun 30, 2023

Add MQTTS #24142

Add MQTTS #24142

Comments

susnato commented Jun 9, 2023

Model description

Open source status

Provide useful links for the implementation

susnato commented Jun 9, 2023

sanchit-gandhi commented Jun 12, 2023

Vaibhavs10 commented Jun 28, 2023

sanchit-gandhi commented Jun 28, 2023 • edited Loading

susnato commented Jun 28, 2023 • edited Loading

sanchit-gandhi commented Jun 30, 2023

sanchit-gandhi commented Jun 28, 2023 •

edited

Loading

susnato commented Jun 28, 2023 •

edited

Loading