Converter doesn't work for Chinese? #287

Strive-for-excellence · 2024-07-31T07:18:32Z

When I use https://huggingface.co/spaces/myshell-ai/OpenVoiceV2 for inference, I find that the performance is inconsistent with the reference audio.

Text: 该努力我也努力过了，该尝试的我都尝试过了！
Reference audio: https://github.com/Strive-for-excellence/show/blob/main/ref.mp3
Base speaker generated audio: https://github.com/Strive-for-excellence/show/blob/main/output_v2_3_ori.wav
Generated audio: https://github.com/Strive-for-excellence/show/blob/main/output.wav
The base speaker generated audio is normal, but the audio after conversion is not normal.

dosubot · 2024-07-31T07:22:32Z

@Strive-for-excellence 你好! 我是Dosu，我是一个可以帮助您解决问题，回答您的问题，并成为贡献者的机器人。我会在等待人类维护者的过程中帮助您。

The inconsistency in the generated audio compared to the reference audio could be due to several factors:

Accent and Emotion: OpenVoice only clones the tone color of the reference speaker and does not clone the accent or emotion. The accent and emotion are controlled by the base speaker TTS model. If the base speaker model does not match the desired accent or emotion, the generated audio will not reflect these characteristics.
Quality of Reference Audio: Ensure that the reference audio is clean, without background noise, and does not contain speech from more than one person. Also, check that the reference audio is not too short and does not contain long blank sections.
Base Speaker Model: For multi-lingual and cross-lingual usage, ensure that you have a base speaker model in the desired language. If you do not have a suitable base speaker model, you can use the OpenAI TTS model as the base speaker.

To address these issues, you can:

Verify the quality of your reference audio.
Ensure that the base speaker model matches the desired accent and emotion.
Consider using or training a base speaker model in Chinese if the current one does not meet your requirements.

Additionally, ensure that you are using the correct base speaker model for Chinese (zh_base_speaker_tts) and that the style parameter is set to 'default' for Chinese text inputs. The model supports English and Chinese primarily, and other languages require a suitable base speaker model.

If the webpage for OpenVoiceV2 on Hugging Face is unresponsive, you might need to wait until it is accessible again to further assess the model's performance or check for updates. You can also refer to the QnA and multi-lingual & cross-lingual examples for additional guidance ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Zengyi-Qin · 2024-08-03T02:03:00Z

Hi - The result you get is normal. The generated voice only clones the tone color, but not any other features such as innotations and accent

dosubot bot added the bug Something isn't working label Jul 31, 2024

dosubot bot mentioned this issue Sep 19, 2024

v2 does not work well with cosyvoice TTS #311

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converter doesn't work for Chinese? #287

Converter doesn't work for Chinese? #287

Strive-for-excellence commented Jul 31, 2024 •

edited

Loading

dosubot bot commented Jul 31, 2024

Zengyi-Qin commented Aug 3, 2024

Converter doesn't work for Chinese? #287

Converter doesn't work for Chinese? #287

Comments

Strive-for-excellence commented Jul 31, 2024 • edited Loading

dosubot bot commented Jul 31, 2024

Zengyi-Qin commented Aug 3, 2024

Strive-for-excellence commented Jul 31, 2024 •

edited

Loading