-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to interpolate mel between TTS and vocoder to increase vocoder compatibility #520
Comments
I think we can add interpolation to audio.py and call it from there as necessary. |
Ok thanks. I'm gonna look into it today |
Also, do you know a way to compute the shape of the target mel spectogram without having to initialize two different audioprocessors ? |
you can have the ratio of two would that be ok? |
Thing is, sample rate is not the only audio parameter that affects the shape of the mel-spectogram. I know fft_size and num_mels affects it as well and I'm suspecting others do. Although I don't know if there is a need to support those other differences... |
Also, the function that interpolate the mel will need both the audio config of the vocoder and of the TTS. I was thinking that if we implement it in audio.py, the audio processor would need to have both configs in the constructor or we could pass the TTS config every time we want to interpolate directly to the function. Do you see a better way to implement this ? |
I was thinking to just having a function in audio.py which would take the target sample rate and than interpolate it relative to its sample-rate
then it is up to the user to use it anywhere he likes. I don't think there is a straight forward way to automate all these without more code spread around, which I'd prefer not to do for the sake of simplicity. What do you think? |
Yeah I guess we can start with that. It was @george-roussos's comment that made me realise that it would be nice to support other kinds of audio processing tweaks (like the fft_size he uses). To support all of this, we could just have a a function like this : def interpolate(target_sr=None, scale_factors=None):
if not scale_factors:
scale_factors = (1, target_sr / self.sample_rate)
return interpolate(scale_factors....) |
Hello @WeberJulian, I've looked into interpolation for vocoder compatibility too, but never had to write the code because I've always trained my synthesizer with the intent to use a specific vocoder. The relevant parameters are the following: 1.
|
Hi @blue-fish, thanks for the info. Or we could do a separate function that does just that once: def compute_scale_factors(self, ap_tts):
y_TTS = torch.rand(ap_tts.sample_rate) # Generating a random wav of length 1 sec
y_vocoder = torch.rand(self.sample_rate) # Same but for the vocoder
mel_TTS = ap_tts.melspectrogram(y_TTS)
mel_vocoder = self.melspectrogram(y_vocoder)
return (mel_vocoder.shape[0] / mel_TTS.shape[0], mel_vocoder.shape[1] / mel_TTS.shape[1]) #returns scale_factors and this for interpolation: def interpolate(mel, target_sr=None, scale_factors=None):
if not scale_factors:
scale_factors = (1, target_sr / self.sample_rate);
return F.interpolate(mel.unsqueeze(0).unsqueeze(0), scale_factor=scale_factor, mode='bilinear').squeeze().squeeze() What do you think @erogol ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts |
fix taco2 speaker-embeddings dimension during inference
Hello,
As of right now, if you train your TTS with certain sample rate, you can't use it with a pertained vocoder trained on an other sampling rate. (It's true with other audio parameters like hop_size).
As proven by this colab notebook, by simply interpolating the mel-spectrogram to the right size, we can use the pertained vocoder regardless of the sample rate it was initially trained on.
Now we have to see what changes are necessary to make this work in mozilla/TTS
The text was updated successfully, but these errors were encountered: