Transcript: Support distribute crowded words in timeline #163

winlinvip · 2024-03-10T03:19:33Z

Dwayne:

Winlin:

This is not a bug in FFmpeg, but rather, the issue arises because Whisp recognized too many words and did not distribute them evenly throughout the timeline, causing them to accumulate all at once.

Reproduce this issue by this video: https://youtu.be/NONRDS7Rpjg

A 15 segment to reproduce this issue:

rapid-speech.mp4

This type of interview program is quite common, where multiple people speaking without pauses can lead to the AI recognizing the voice as continuously speaking for over ten seconds.

winlinvip · 2024-03-10T03:23:29Z

First of all, SRS Stack will write LF when subtitle is too long, for example, if OpenAI whisper response is:

0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone sales in China compared it to a

SRS Stack will convert to:

0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the
details of this report. I know Huawei is
obviously a very big competitor. Yeah. And
that's small but growing. Let's get it that
way. But the headline here is counterpart
research looked at the first six weeks of
smartphone sales in China compared it to a

It will cause the subtitle very long, bellow is the result:

output-LF-by-SRS-Stack.mp4

Actually, FFmpeg libass will do the work, so we only need to simply use the output of whipser, bellow is the example:

output-1subtitle.mp4

I think it should fix almost all common cases.

winlinvip · 2024-03-10T03:32:47Z

Input file:

rapid-speech.mp4

By FFmpeg:

ffmpeg -i input.mp4 -vf "subtitles=input.srt:force_style='Alignment=2,MarginV=20'" \
    -vcodec libx264 -profile:v main -preset:v medium -tune zerolatency  -bf 0  \
    -acodec aac -copyts -y output.mp4

Sometime, OpenAI whisper response with:

0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone sales in China compared it to a

The result is bellow:

output-1subtitle.mp4

Sometimes, it responses:

0
00:00:00,550 --> 00:00:06,629
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's

1
00:00:07,350 --> 00:00:13,829
small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone

2
00:00:13,829 --> 00:00:15,789
sales in China compared it to.

The result is bellow:

output-3subtitles.mp4

In most situations, OpenAI Whisper will generate multiple subtitles. If it doesn't, we might have to create them ourselves, which could be risky due to the potential for introducing bugs. Therefore, I would avoid doing this unless absolutely necessary.

winlinvip · 2024-03-10T04:05:12Z

Also add a Segments parameters in Fix Queue:

User can clear the subtitle if the subtitle is too long.

Also show the data in overlay queue.

winlinvip added a commit that referenced this issue Mar 10, 2024

Transcript: Use Whisper response without LF. (#163). v5.13.31

5ffe4ab

winlinvip added a commit that referenced this issue Mar 10, 2024

Transcript: Use Whisper response without LF. (#163). v5.13.31

5b34166

winlinvip added a commit that referenced this issue Mar 10, 2024

Transcript: Use Whisper response without LF. (#163). v5.14.5

d051bf2

winlinvip self-assigned this Mar 10, 2024

winlinvip closed this as completed Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcript: Support distribute crowded words in timeline #163

Transcript: Support distribute crowded words in timeline #163

winlinvip commented Mar 10, 2024

winlinvip commented Mar 10, 2024 •

edited

Loading

winlinvip commented Mar 10, 2024 •

edited

Loading

winlinvip commented Mar 10, 2024 •

edited

Loading

Transcript: Support distribute crowded words in timeline #163

Transcript: Support distribute crowded words in timeline #163

Comments

winlinvip commented Mar 10, 2024

winlinvip commented Mar 10, 2024 • edited Loading

winlinvip commented Mar 10, 2024 • edited Loading

winlinvip commented Mar 10, 2024 • edited Loading

winlinvip commented Mar 10, 2024 •

edited

Loading

winlinvip commented Mar 10, 2024 •

edited

Loading

winlinvip commented Mar 10, 2024 •

edited

Loading