Audio Component Streaming Behaviour is weird? #7742

s-kruschel · 2024-03-19T15:15:12Z

Describe the bug

Hey folks,

I've searched for similar issues, and there are several gradio Audio component issues. So I'm not sure if they report the same problems.

What I'm trying to do is to stream the TTS OpenAI API response. The OpenAI part is working. However, I do not get the Audio component behaviour.

What I've tried:

To return only a single bytes object chunk. This leads to stuttering voice as the audio plays, then stops, then receives the next chunk.
To return a concatenation of all bytes object chunks (chunks += chunk). This leads to audio, that plays for a second until the next chunk is concatenated to the already existing chunks. Then the audio autoplay starts from beginning. Hence, the audio is also stuttering and never plays through.

Further, only
out = gr.Audio(autoplay=True) seems to work.
out = gr.Audio(autoplay=True, streaming=True) does not work and it just does nothing for whatever reason.

Actually, the optimal solution in my opinion would be, if "streaming=True" is set and one appends the incoming chunks to the already existing chunks, that the audio component does not always restart to play.

Have you searched existing issues? 🔎

I have searched and found no existing issues

Reproduction

def text_to_speech_streaming():
    with client.audio.speech.with_streaming_response.create(
            model="tts-1-hd",
            voice="alloy",
            input="This is a special test text that I want to get generated to test streaming the generated voice directly from OpenAI into my gradio application."
        ) as response:
            for chunk in response.iter_bytes(chunk_size=8192):
                yield chunk  

def add_to_stream(audio, instream):
    global tts_generator
   
    if audio is None:
        return gr.update(), instream
    
    if tts_generator is None:
        tts_generator = text_to_speech_streaming()
    
    try: 
        chunk = next(tts_generator)
    except StopIteration:
        tts_generator = None
     
    
    return chunk, chunk


with gr.Blocks() as demo:
    inp = gr.Audio(sources="microphone")
    out = gr.Audio(streaming=True)
    stream = gr.State()

    clear = gr.Button("Clear")

    inp.stream(add_to_stream, [inp, stream], [out, stream])
    clear.click(lambda: [None, None, None], None, [inp, out, stream])


if __name__ == "__main__":
    demo.launch()

Screenshot

No response

Logs

No response

System Info

Gradio Environment Information:
------------------------------
Operating System: Darwin
gradio version: 4.18.0
gradio_client version: 0.10.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.109.2
ffmpy: 0.3.2
gradio-client==0.10.0 is not installed.
httpx: 0.26.0
huggingface-hub: 0.20.3
importlib-resources: 6.1.1
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.2
numpy: 1.26.4
orjson: 3.9.13
packaging: 23.2
pandas: 2.2.1
pillow: 10.2.0
pydantic: 2.6.1
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
ruff: 0.2.1
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.9.0
uvicorn: 0.27.1
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.


gradio_client dependencies in your environment:

fsspec: 2024.2.0
httpx: 0.26.0
huggingface-hub: 0.20.3
packaging: 23.2
typing-extensions: 4.9.0
websockets: 11.0.3

Severity

I can work around it

The text was updated successfully, but these errors were encountered:

ajayarora1235 · 2024-06-09T23:57:21Z

did you end up finding a solution to this?

s-kruschel · 2024-06-10T03:42:49Z

Unfortunately not…

abidlabs · 2024-07-31T22:40:29Z

Should be fixed via #8906. If you'd like to try it out, you can install gradio from this branch: #8843

pablovela5620 · 2024-08-13T19:02:21Z

@s-kruschel what work around did you find in the meantime? @abidlabs it would be awesome if there was an audio-streaming example similar to this https://www.gradio.app/guides/streaming-outputs
Right now its not super clear exactly how audio streaming outputs work (in particular for tts)

freddyaboulton · 2024-08-13T19:52:26Z

@pablovela5620 - we have a draft guide for audio streaming that will be published in 5.0.

Feedback welcome as we're still tweaking the implementation https://gradio-d8zf06g8v-hugging-face.vercel.app/main/guides/streaming-outputs#streaming-media

pablovela5620 · 2024-08-13T20:58:45Z

Beautiful! Ya'll were already thinking about this, I'll take a read

pablovela5620 · 2024-08-24T19:51:51Z

@freddyaboulton
One thing that would help with this documentation would be a more comprehensive example like using inference pro bark API or maybe https://github.com/huggingface/parler-tts. Its not super clear to me if whats is being returned is an incrementally updated file or bytes

having more verbose explicit examples help a lot with getting things right from the get-go (even if they may be overly verbose at times)

freddyaboulton · 2024-08-26T16:36:28Z

Hi @pablovela5620 - You just need to return the next chunk of bytes (or a file containing the next chunks).

I've prepared this example using Parler TTS: https://huggingface.co/spaces/gradio/magic-8-ball

It's added in this PR which adds more guides for streaming (#9173)

pablovela5620 · 2024-08-30T19:46:42Z

@freddyaboulton you are awesome, this is exactly what I was looking for. Thank you

s-kruschel added the bug Something isn't working label Mar 19, 2024

abidlabs closed this as completed Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio Component Streaming Behaviour is weird? #7742

Audio Component Streaming Behaviour is weird? #7742

s-kruschel commented Mar 19, 2024 •

edited

Loading

ajayarora1235 commented Jun 9, 2024

s-kruschel commented Jun 10, 2024

abidlabs commented Jul 31, 2024

pablovela5620 commented Aug 13, 2024

freddyaboulton commented Aug 13, 2024

pablovela5620 commented Aug 13, 2024

pablovela5620 commented Aug 24, 2024

freddyaboulton commented Aug 26, 2024

pablovela5620 commented Aug 30, 2024

Audio Component Streaming Behaviour is weird? #7742

Audio Component Streaming Behaviour is weird? #7742

Comments

s-kruschel commented Mar 19, 2024 • edited Loading

Describe the bug

Have you searched existing issues? 🔎

Reproduction

Screenshot

Logs

System Info

Severity

ajayarora1235 commented Jun 9, 2024

s-kruschel commented Jun 10, 2024

abidlabs commented Jul 31, 2024

pablovela5620 commented Aug 13, 2024

freddyaboulton commented Aug 13, 2024

pablovela5620 commented Aug 13, 2024

pablovela5620 commented Aug 24, 2024

freddyaboulton commented Aug 26, 2024

pablovela5620 commented Aug 30, 2024

s-kruschel commented Mar 19, 2024 •

edited

Loading