Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio Streaming: large latency before first chunk is played #8185

Closed
sanchit-gandhi opened this issue May 1, 2024 · 18 comments
Closed

Audio Streaming: large latency before first chunk is played #8185

sanchit-gandhi opened this issue May 1, 2024 · 18 comments
Assignees
Labels
bug Something isn't working Priority High priority issues
Milestone

Comments

@sanchit-gandhi
Copy link

We typically stream audio outputs when latency is a major consideration. E.g. if we're generating 10-seconds of audio and want the perceived latency to be as low as possible, we can stream the outputs in 1-second chunks, such that the user can start playing the audio 10x faster than if they waited for the full 10-second audio. Here's an example for Parler-TTS.

When using the Gradio streaming component, we typically have to wait 3-4 seconds after the first chunk is returned before the output starts playing. This fixed overhead negates the latency improvement we expect from streaming. The result is that it's very difficult to showcase streaming outputs using Gradio.

This Space demonstrates the issue in a MWE: https://huggingface.co/spaces/sanchit-gandhi/audio-streaming
We have a 30-second audio, which we stream in 2-second chunks. It takes 1-second for the first chunk to be returned, but the audio only starts playing after an additional 3-4 seconds.

If we could reduce this to near zero additional overhead, it would make showcasing streaming outputs in Gradio much more feasible.

cc @aliabd @abidlabs @hannahblair @ylacombe

@sanchit-gandhi
Copy link
Author

Related to #8177, but the MWE demonstrates that the full audio does not need to be streamed, but rather there's a fixed lag after the first chunk is received

@sanchit-gandhi
Copy link
Author

Any luck with this @aliabd?

@abidlabs abidlabs added bug Something isn't working Priority High priority issues labels Jun 11, 2024
@freddyaboulton freddyaboulton self-assigned this Jul 17, 2024
@freddyaboulton
Copy link
Collaborator

Hey @sanchit-gandhi - taking a look at this and our audio streaming approach in general. I think there are things we can improve on the gradio side but why is there a time.sleep in the audio processing loop of your demo? If you remove it the first chunk starts playing after < 1 second. I think the browser won't play until a few chunks have been processed. Without the sleep the entire audio is processed in 1-2 seconds.

@freddyaboulton freddyaboulton added this to the Gradio 5️⃣ milestone Jul 17, 2024
@ylacombe
Copy link

ylacombe commented Jul 17, 2024

Hi @freddyaboulton, thanks for taking a look into this!

I think the time.sleep was added to emulate processing time - say a model generating audio. In that case, the processing time - i.e half the chunk, i.e the sleeping time - is faster than real time generation of the audio.
Ideally, we wouldn't have to wait for a few chunks to have been generated to start playing the audio, which is why @sanchit-gandhi opened the issue!

@ylacombe
Copy link

Hey @freddyaboulton, have you been able to take a look at the above message and the audio streaming latency?

@freddyaboulton
Copy link
Collaborator

Hi @ylacombe ! Sorry I did not get back to you earlier and thank you for providing more details. Yes I figured out the issue. The html <audio> tag expects a minimum amount of audio before autoplaying (~5 seconds). If you set the chunk length to 6 seconds in your demo, the browser will start autoplaying as soon as the first chunk is processed.

The solution is to use a different streaming implementation that gives us more control of when the browser starts playing video. Should have a PR for that open in the next day or two.

@abidlabs
Copy link
Member

Closed via #8906. If you'd like to try it out, you can install gradio from this branch: #8843

@ZaymeShaw
Copy link

Very great job! I have try the latest branch on #8843, The latency problem has been fixed already. But there seems to have some noise in the streaming audio now.

@freddyaboulton
Copy link
Collaborator

Please share the full demo and audio file so that we can take a look!

@steven8274
Copy link

steven8274 commented Aug 7, 2024

I met the same problem.However, even I use the #8906 source code to install gradio, the problem not was solved.There is still 3~4s delay and audio playing is not smooth(has some gap, look like lack of audio data).This is my demo code:

import gradio as gr
from pydub import AudioSegment
from time import sleep
import numpy as np
import datetime

audio_list = []
def add_to_stream(audio):
    sleep(0.05)
    global audio_list
    audio_list.append(audio)

with gr.Blocks() as demo:
    inp = gr.Audio(sources=["microphone"], streaming=True)
    inp.stream(add_to_stream, [inp], [])

    stream_as_file_btn = gr.Button("Stream as File")
    stream_as_file_output = gr.Audio(streaming=True)
    stream_as_file_output.autoplay = True

    def stream_file():
        global audio_list
        while True:
            while len(audio_list) == 0:
                print('stream out pull data, but no data available now...')
                sleep(0.05)
            chunk = audio_list[0]
            audio_list = audio_list[1:]
            print('yield audio chunk, samples: {}, cached audio chunks: {}, at: {}'.format(len(chunk[1]), len(audio_list), datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")))
            yield chunk

    stream_as_file_btn.click(
        stream_file, [], stream_as_file_output
    )


if __name__ == "__main__":
    demo.launch(server_name='0.0.0.0', server_port=8000)

I figured audio data output speed via log, it's coincident with it's sample rate.

Demo usage:
1.click 'stream_as_file_btn' to start audio data fetching.
2.click 'inp' audio component's recording button to start generating audio data.

After about half an second, you will see 'yield audio chunk...', which means audio data beging outputing.

@ylacombe
Copy link

ylacombe commented Aug 7, 2024

Same issue on my side, the audio chunks still accumulate for a few seconds before starting to play

@steven8274
Copy link

Same issue on my side, the audio chunks still accumulate for a few seconds before starting to play

Besides, the audio data seems to be comsumed too quick which make the audio playing always pause.

@abidlabs
Copy link
Member

abidlabs commented Aug 7, 2024

Just to confirm @ylacombe @steven8274 this is after installing gradio with:

pip install https://gradio-pypi-previews.s3.amazonaws.com/ea384210055da2b1e6a2919b9ee4f8f3e137fa81/gradio-4.40.0-py3-none-any.whl

and this happens consistently, with all recorded audio (or does it have to be a particular length, etc.)? cc @freddyaboulton

@ylacombe
Copy link

ylacombe commented Aug 7, 2024

Hey @abidlabs, it does happen after installing the right version. I've sent an example to @freddyaboulton: the first chunk is played almost right away but there's a big latency before the next chunks are played, even though they're available.

@freddyaboulton
Copy link
Collaborator

Yes taking a look - @ylacombe 's issue has something to do with using very small chunk lengths

@steven8274
Copy link

Yes taking a look - @ylacombe 's issue has something to do with using very small chunk lengths

@freddyaboulton Hi,thanks for paying attention to my problem!In my case, I use microphone to generate recorded audio,which is 48Khz, and I received audio chunk with 24000 sample per stream callback in every half a second.Is this chunk length too small?Maybe you can try my demo code to check if the audio componet is working fine.

@freddyaboulton
Copy link
Collaborator

Hi @steven8274 ! I looked at your issue as well and I think it's a different cause. I'm still investigating but I will be tweaking this over the next couple of weeks and will share a new wheel link for you to try soon.

BTW we'll be making the stream callback frequency configurable in #8941

@steven8274
Copy link

Hi @steven8274 ! I looked at your issue as well and I think it's a different cause. I'm still investigating but I will be tweaking this over the next couple of weeks and will share a new wheel link for you to try soon.

BTW we'll be making the stream callback frequency configurable in #8941

Thank you very much!Waiting for your good news!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority High priority issues
Projects
None yet
Development

No branches or pull requests

7 participants