Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 does not work well with cosyvoice TTS #311

Open
xipingL opened this issue Sep 19, 2024 · 1 comment
Open

v2 does not work well with cosyvoice TTS #311

xipingL opened this issue Sep 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@xipingL
Copy link

xipingL commented Sep 19, 2024

  1. use CosyVoice Chinese woman to generate audio (first video), then use OpenVoice ToneColorConverter to generate audio(third video) according target_se(second video) that has serious electrical tone
  2. use CosyVoice Chinese man to generate audio (fourth video), then use OpenVoice ToneColorConverter to generate audio(sixth video) according target_se(fifth video) that not corresponding timbre was converted
woman_source.mp4
man_target.mp4
man_gen.mp4
man_source.mp4
woman_target.mp4
woman_gen.mp4

Thanks.

Copy link

dosubot bot commented Sep 19, 2024

@xipingL Hello there! I'm a bot designed to help you with bugs, questions, and contributions. I'm here to assist you while waiting for a human maintainer.

I found a similar open issue: Converter doesn't work for Chinese?, which was last updated on August 03, 2024. The suggested solution includes:

  1. Accent and Emotion: OpenVoice only clones the tone color of the reference speaker and does not clone the accent or emotion. The base speaker TTS model controls these aspects.
  2. Quality of Reference Audio: Ensure the reference audio is clean, without background noise, and does not contain speech from more than one person.
  3. Base Speaker Model: For multi-lingual and cross-lingual usage, ensure you have a base speaker model in the desired language. If not, consider using the OpenAI TTS model as the base speaker.

Additionally, ensure you are using the correct base speaker model for Chinese (zh_base_speaker_tts) and that the style parameter is set to 'default' for Chinese text inputs [1].

The issue with v2 not working well with CosyVoice TTS, specifically when using CosyVoice Chinese woman and man to generate audio and then using OpenVoice ToneColorConverter, is likely due to OpenVoice only cloning the tone color and not the accent or emotion. The base speaker models from MeloTTS, which OpenVoice relies on, may not perfectly align with CosyVoice TTS outputs [2][3][4].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the bug Something isn't working label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant