-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio Streaming: large latency before first chunk is played #8185
Comments
Related to #8177, but the MWE demonstrates that the full audio does not need to be streamed, but rather there's a fixed lag after the first chunk is received |
Any luck with this @aliabd? |
Hey @sanchit-gandhi - taking a look at this and our audio streaming approach in general. I think there are things we can improve on the gradio side but why is there a time.sleep in the audio processing loop of your demo? If you remove it the first chunk starts playing after < 1 second. I think the browser won't play until a few chunks have been processed. Without the sleep the entire audio is processed in 1-2 seconds. |
Hi @freddyaboulton, thanks for taking a look into this! I think the |
Hey @freddyaboulton, have you been able to take a look at the above message and the audio streaming latency? |
Hi @ylacombe ! Sorry I did not get back to you earlier and thank you for providing more details. Yes I figured out the issue. The html The solution is to use a different streaming implementation that gives us more control of when the browser starts playing video. Should have a PR for that open in the next day or two. |
Very great job! I have try the latest branch on #8843, The latency problem has been fixed already. But there seems to have some noise in the streaming audio now. |
Please share the full demo and audio file so that we can take a look! |
I met the same problem.However, even I use the #8906 source code to install gradio, the problem not was solved.There is still 3~4s delay and audio playing is not smooth(has some gap, look like lack of audio data).This is my demo code:
I figured audio data output speed via log, it's coincident with it's sample rate. Demo usage: After about half an second, you will see 'yield audio chunk...', which means audio data beging outputing. |
Same issue on my side, the audio chunks still accumulate for a few seconds before starting to play |
Besides, the audio data seems to be comsumed too quick which make the audio playing always pause. |
Just to confirm @ylacombe @steven8274 this is after installing
and this happens consistently, with all recorded audio (or does it have to be a particular length, etc.)? cc @freddyaboulton |
Hey @abidlabs, it does happen after installing the right version. I've sent an example to @freddyaboulton: the first chunk is played almost right away but there's a big latency before the next chunks are played, even though they're available. |
Yes taking a look - @ylacombe 's issue has something to do with using very small chunk lengths |
@freddyaboulton Hi,thanks for paying attention to my problem!In my case, I use microphone to generate recorded audio,which is 48Khz, and I received audio chunk with 24000 sample per stream callback in every half a second.Is this chunk length too small?Maybe you can try my demo code to check if the audio componet is working fine. |
Hi @steven8274 ! I looked at your issue as well and I think it's a different cause. I'm still investigating but I will be tweaking this over the next couple of weeks and will share a new wheel link for you to try soon. BTW we'll be making the stream callback frequency configurable in #8941 |
Thank you very much!Waiting for your good news! |
We typically stream audio outputs when latency is a major consideration. E.g. if we're generating 10-seconds of audio and want the perceived latency to be as low as possible, we can stream the outputs in 1-second chunks, such that the user can start playing the audio 10x faster than if they waited for the full 10-second audio. Here's an example for Parler-TTS.
When using the Gradio streaming component, we typically have to wait 3-4 seconds after the first chunk is returned before the output starts playing. This fixed overhead negates the latency improvement we expect from streaming. The result is that it's very difficult to showcase streaming outputs using Gradio.
This Space demonstrates the issue in a MWE: https://huggingface.co/spaces/sanchit-gandhi/audio-streaming
We have a 30-second audio, which we stream in 2-second chunks. It takes 1-second for the first chunk to be returned, but the audio only starts playing after an additional 3-4 seconds.
If we could reduce this to near zero additional overhead, it would make showcasing streaming outputs in Gradio much more feasible.
cc @aliabd @abidlabs @hannahblair @ylacombe
The text was updated successfully, but these errors were encountered: