Audio Streaming: large latency before first chunk is played #8185

sanchit-gandhi · 2024-05-01T13:05:30Z

We typically stream audio outputs when latency is a major consideration. E.g. if we're generating 10-seconds of audio and want the perceived latency to be as low as possible, we can stream the outputs in 1-second chunks, such that the user can start playing the audio 10x faster than if they waited for the full 10-second audio. Here's an example for Parler-TTS.

When using the Gradio streaming component, we typically have to wait 3-4 seconds after the first chunk is returned before the output starts playing. This fixed overhead negates the latency improvement we expect from streaming. The result is that it's very difficult to showcase streaming outputs using Gradio.

This Space demonstrates the issue in a MWE: https://huggingface.co/spaces/sanchit-gandhi/audio-streaming
We have a 30-second audio, which we stream in 2-second chunks. It takes 1-second for the first chunk to be returned, but the audio only starts playing after an additional 3-4 seconds.

If we could reduce this to near zero additional overhead, it would make showcasing streaming outputs in Gradio much more feasible.

cc @aliabd @abidlabs @hannahblair @ylacombe

sanchit-gandhi · 2024-05-01T14:30:13Z

Related to #8177, but the MWE demonstrates that the full audio does not need to be streamed, but rather there's a fixed lag after the first chunk is received

sanchit-gandhi · 2024-05-15T08:44:19Z

Any luck with this @aliabd?

freddyaboulton · 2024-07-17T08:41:41Z

Hey @sanchit-gandhi - taking a look at this and our audio streaming approach in general. I think there are things we can improve on the gradio side but why is there a time.sleep in the audio processing loop of your demo? If you remove it the first chunk starts playing after < 1 second. I think the browser won't play until a few chunks have been processed. Without the sleep the entire audio is processed in 1-2 seconds.

ylacombe · 2024-07-17T11:39:25Z

Hi @freddyaboulton, thanks for taking a look into this!

I think the time.sleep was added to emulate processing time - say a model generating audio. In that case, the processing time - i.e half the chunk, i.e the sleeping time - is faster than real time generation of the audio.
Ideally, we wouldn't have to wait for a few chunks to have been generated to start playing the audio, which is why @sanchit-gandhi opened the issue!

ylacombe · 2024-07-25T09:00:03Z

Hey @freddyaboulton, have you been able to take a look at the above message and the audio streaming latency?

freddyaboulton · 2024-07-25T16:37:58Z

Hi @ylacombe ! Sorry I did not get back to you earlier and thank you for providing more details. Yes I figured out the issue. The html <audio> tag expects a minimum amount of audio before autoplaying (~5 seconds). If you set the chunk length to 6 seconds in your demo, the browser will start autoplaying as soon as the first chunk is processed.

The solution is to use a different streaming implementation that gives us more control of when the browser starts playing video. Should have a PR for that open in the next day or two.

abidlabs · 2024-07-31T22:39:22Z

Closed via #8906. If you'd like to try it out, you can install gradio from this branch: #8843

ZaymeShaw · 2024-08-01T06:37:29Z

Very great job! I have try the latest branch on #8843, The latency problem has been fixed already. But there seems to have some noise in the streaming audio now.

freddyaboulton · 2024-08-01T14:38:27Z

Please share the full demo and audio file so that we can take a look!

steven8274 · 2024-08-07T06:18:43Z

I met the same problem.However, even I use the #8906 source code to install gradio, the problem not was solved.There is still 3~4s delay and audio playing is not smooth(has some gap, look like lack of audio data).This is my demo code:

import gradio as gr
from pydub import AudioSegment
from time import sleep
import numpy as np
import datetime

audio_list = []
def add_to_stream(audio):
    sleep(0.05)
    global audio_list
    audio_list.append(audio)

with gr.Blocks() as demo:
    inp = gr.Audio(sources=["microphone"], streaming=True)
    inp.stream(add_to_stream, [inp], [])

    stream_as_file_btn = gr.Button("Stream as File")
    stream_as_file_output = gr.Audio(streaming=True)
    stream_as_file_output.autoplay = True

    def stream_file():
        global audio_list
        while True:
            while len(audio_list) == 0:
                print('stream out pull data, but no data available now...')
                sleep(0.05)
            chunk = audio_list[0]
            audio_list = audio_list[1:]
            print('yield audio chunk, samples: {}, cached audio chunks: {}, at: {}'.format(len(chunk[1]), len(audio_list), datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")))
            yield chunk

    stream_as_file_btn.click(
        stream_file, [], stream_as_file_output
    )


if __name__ == "__main__":
    demo.launch(server_name='0.0.0.0', server_port=8000)

I figured audio data output speed via log, it's coincident with it's sample rate.

Demo usage:
1.click 'stream_as_file_btn' to start audio data fetching.
2.click 'inp' audio component's recording button to start generating audio data.

After about half an second, you will see 'yield audio chunk...', which means audio data beging outputing.

ylacombe · 2024-08-07T08:42:14Z

Same issue on my side, the audio chunks still accumulate for a few seconds before starting to play

steven8274 · 2024-08-07T08:56:18Z

Same issue on my side, the audio chunks still accumulate for a few seconds before starting to play

Besides, the audio data seems to be comsumed too quick which make the audio playing always pause.

abidlabs · 2024-08-07T16:00:32Z

Just to confirm @ylacombe @steven8274 this is after installing gradio with:

pip install https://gradio-pypi-previews.s3.amazonaws.com/ea384210055da2b1e6a2919b9ee4f8f3e137fa81/gradio-4.40.0-py3-none-any.whl

and this happens consistently, with all recorded audio (or does it have to be a particular length, etc.)? cc @freddyaboulton

ylacombe · 2024-08-07T19:13:43Z

Hey @abidlabs, it does happen after installing the right version. I've sent an example to @freddyaboulton: the first chunk is played almost right away but there's a big latency before the next chunks are played, even though they're available.

freddyaboulton · 2024-08-07T21:10:55Z

Yes taking a look - @ylacombe 's issue has something to do with using very small chunk lengths

steven8274 · 2024-08-08T01:35:44Z

Yes taking a look - @ylacombe 's issue has something to do with using very small chunk lengths

@freddyaboulton Hi,thanks for paying attention to my problem!In my case, I use microphone to generate recorded audio,which is 48Khz, and I received audio chunk with 24000 sample per stream callback in every half a second.Is this chunk length too small?Maybe you can try my demo code to check if the audio componet is working fine.

freddyaboulton · 2024-08-08T16:40:47Z

Hi @steven8274 ! I looked at your issue as well and I think it's a different cause. I'm still investigating but I will be tweaking this over the next couple of weeks and will share a new wheel link for you to try soon.

BTW we'll be making the stream callback frequency configurable in #8941

steven8274 · 2024-08-09T01:01:13Z

Hi @steven8274 ! I looked at your issue as well and I think it's a different cause. I'm still investigating but I will be tweaking this over the next couple of weeks and will share a new wheel link for you to try soon.

BTW we'll be making the stream callback frequency configurable in #8941

Thank you very much!Waiting for your good news!

abidlabs assigned aliabid94 May 19, 2024

abidlabs added bug Something isn't working Priority High priority issues labels Jun 11, 2024

freddyaboulton self-assigned this Jul 17, 2024

freddyaboulton added this to the Gradio 5️⃣ milestone Jul 17, 2024

freddyaboulton mentioned this issue Jul 25, 2024

Use HTTP Livestreaming for audio/video streaming out #8906

Merged

ZaymeShaw mentioned this issue Jul 29, 2024

请问使用webui进行流式推理的时候音频播放为什么会有5秒钟左右的延迟？ 2noise/ChatTTS#587

Open

abidlabs closed this as completed Jul 31, 2024

steven8274 mentioned this issue Aug 7, 2024

#8906 still not solve audio play out high delay #9044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio Streaming: large latency before first chunk is played #8185

Audio Streaming: large latency before first chunk is played #8185

sanchit-gandhi commented May 1, 2024

sanchit-gandhi commented May 1, 2024

sanchit-gandhi commented May 15, 2024

freddyaboulton commented Jul 17, 2024

ylacombe commented Jul 17, 2024 •

edited

Loading

ylacombe commented Jul 25, 2024

freddyaboulton commented Jul 25, 2024

abidlabs commented Jul 31, 2024

ZaymeShaw commented Aug 1, 2024

freddyaboulton commented Aug 1, 2024

steven8274 commented Aug 7, 2024 •

edited

Loading

ylacombe commented Aug 7, 2024

steven8274 commented Aug 7, 2024

abidlabs commented Aug 7, 2024 •

edited

Loading

ylacombe commented Aug 7, 2024

freddyaboulton commented Aug 7, 2024

steven8274 commented Aug 8, 2024

freddyaboulton commented Aug 8, 2024

steven8274 commented Aug 9, 2024

Audio Streaming: large latency before first chunk is played #8185

Audio Streaming: large latency before first chunk is played #8185

Comments

sanchit-gandhi commented May 1, 2024

sanchit-gandhi commented May 1, 2024

sanchit-gandhi commented May 15, 2024

freddyaboulton commented Jul 17, 2024

ylacombe commented Jul 17, 2024 • edited Loading

ylacombe commented Jul 25, 2024

freddyaboulton commented Jul 25, 2024

abidlabs commented Jul 31, 2024

ZaymeShaw commented Aug 1, 2024

freddyaboulton commented Aug 1, 2024

steven8274 commented Aug 7, 2024 • edited Loading

ylacombe commented Aug 7, 2024

steven8274 commented Aug 7, 2024

abidlabs commented Aug 7, 2024 • edited Loading

ylacombe commented Aug 7, 2024

freddyaboulton commented Aug 7, 2024

steven8274 commented Aug 8, 2024

freddyaboulton commented Aug 8, 2024

steven8274 commented Aug 9, 2024

ylacombe commented Jul 17, 2024 •

edited

Loading

steven8274 commented Aug 7, 2024 •

edited

Loading

abidlabs commented Aug 7, 2024 •

edited

Loading