Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the speech speaker condition selection #810

Open
JohnHerry opened this issue Aug 5, 2024 · 0 comments
Open

about the speech speaker condition selection #810

JohnHerry opened this issue Aug 5, 2024 · 0 comments

Comments

@JohnHerry
Copy link

I read the paper and noticed that when training AR model part, the speaker condition is another clip of the same person speaking, while when training the Diffusion part, it seems that the speaker condition clip is just a clip of the target speech. why there is a diffrent design? what if we use the same target speech as speaker cond during the training of AR model? or just use another sample of the same speaker as speaker condition when training the Diffusion model? what is the reason? thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant