Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter doesn't work for Chinese? #287

Open
Strive-for-excellence opened this issue Jul 31, 2024 · 2 comments
Open

Converter doesn't work for Chinese? #287

Strive-for-excellence opened this issue Jul 31, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Strive-for-excellence
Copy link

Strive-for-excellence commented Jul 31, 2024

When I use https://huggingface.co/spaces/myshell-ai/OpenVoiceV2 for inference, I find that the performance is inconsistent with the reference audio.

Text: 该努力我也努力过了,该尝试的我都尝试过了!
Reference audio: https://github.com/Strive-for-excellence/show/blob/main/ref.mp3
Base speaker generated audio: https://github.com/Strive-for-excellence/show/blob/main/output_v2_3_ori.wav
Generated audio: https://github.com/Strive-for-excellence/show/blob/main/output.wav
The base speaker generated audio is normal, but the audio after conversion is not normal.

@dosubot dosubot bot added the bug Something isn't working label Jul 31, 2024
Copy link

dosubot bot commented Jul 31, 2024

@Strive-for-excellence 你好! 我是Dosu,我是一个可以帮助您解决问题,回答您的问题,并成为贡献者的机器人。我会在等待人类维护者的过程中帮助您。

The inconsistency in the generated audio compared to the reference audio could be due to several factors:

  1. Accent and Emotion: OpenVoice only clones the tone color of the reference speaker and does not clone the accent or emotion. The accent and emotion are controlled by the base speaker TTS model. If the base speaker model does not match the desired accent or emotion, the generated audio will not reflect these characteristics.

  2. Quality of Reference Audio: Ensure that the reference audio is clean, without background noise, and does not contain speech from more than one person. Also, check that the reference audio is not too short and does not contain long blank sections.

  3. Base Speaker Model: For multi-lingual and cross-lingual usage, ensure that you have a base speaker model in the desired language. If you do not have a suitable base speaker model, you can use the OpenAI TTS model as the base speaker.

To address these issues, you can:

  • Verify the quality of your reference audio.
  • Ensure that the base speaker model matches the desired accent and emotion.
  • Consider using or training a base speaker model in Chinese if the current one does not meet your requirements.

Additionally, ensure that you are using the correct base speaker model for Chinese (zh_base_speaker_tts) and that the style parameter is set to 'default' for Chinese text inputs. The model supports English and Chinese primarily, and other languages require a suitable base speaker model.

If the webpage for OpenVoiceV2 on Hugging Face is unresponsive, you might need to wait until it is accessible again to further assess the model's performance or check for updates. You can also refer to the QnA and multi-lingual & cross-lingual examples for additional guidance [1][2].

To continue talking to Dosu, mention @dosu.

@Zengyi-Qin
Copy link
Contributor

Hi - The result you get is normal. The generated voice only clones the tone color, but not any other features such as innotations and accent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants