-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why semantic_tokens generation is so incredibly slow? :( #6
Comments
P.S. Just for the reference I can render 50 of those prompts in a batch with SpeechT5 in about 15 seconds wall time same HW (nothing too fancy, consumer-grade Intel A770). With first audio starting to come out in just 0.6 second on all 50 of them. So assuming semantic generation is batchable, by the time it completes I'd be already half-way through and some of them would already be done playing. Here is my code if you are curious: https://github.com/sippy/Infernos/blob/main/HelloSippyTTSRT/HelloSippyRTPipe.py |
Hi, thanks for reaching out. The inference speed requires TensorRT-LLM deployment. We are working out to have this open source in upcoming weeks. |
t2s is a AR model, so it slow than s2a. |
Hey, first of all thanks for doing and publishing a great work!
But coming at the practical side, I am looking at rendering my favourite set of matrix quotes:
The s2a is indeed quite fast, however t2s is absolutely horrible. With SpeechT5 I can get t2s in 100-200ms regardless of the prompt length and with the batch size of 50-100. I am missing something here?
The text was updated successfully, but these errors were encountered: