You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your activity in the various LPCNet/Tacotron-2 discussions. I have been trying to integrate the two models with the steps outlined by @MlWoo and yourself, but the results are not great. I can pick out words but the voice is very hoarse/noisy.
In my latest experiment I try replicating your results with the 120h spanish dataset but the results are still noisy. One things to note in my process is I used the entire dataset in training both models without making any adjustments for multiple speakers. Was this correct to do?
Another question I have is whether I need to do any special preprocessing for this dataset?
Below are my latest alignment and synthesis samples. Thank you and I look forward to your response!
Hi @carlfm01,
Thank you for your activity in the various LPCNet/Tacotron-2 discussions. I have been trying to integrate the two models with the steps outlined by @MlWoo and yourself, but the results are not great. I can pick out words but the voice is very hoarse/noisy.
In my latest experiment I try replicating your results with the 120h spanish dataset but the results are still noisy. One things to note in my process is I used the entire dataset in training both models without making any adjustments for multiple speakers. Was this correct to do?
Another question I have is whether I need to do any special preprocessing for this dataset?
Below are my latest alignment and synthesis samples. Thank you and I look forward to your response!
samples.zip
The text was updated successfully, but these errors were encountered: