-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcript: Support distribute crowded words in timeline #163
Comments
First of all, SRS Stack will write LF when subtitle is too long, for example, if OpenAI whisper response is: 0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone sales in China compared it to a SRS Stack will convert to: 0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the
details of this report. I know Huawei is
obviously a very big competitor. Yeah. And
that's small but growing. Let's get it that
way. But the headline here is counterpart
research looked at the first six weeks of
smartphone sales in China compared it to a It will cause the subtitle very long, bellow is the result: output-LF-by-SRS-Stack.mp4Actually, FFmpeg libass will do the work, so we only need to simply use the output of whipser, bellow is the example: output-1subtitle.mp4I think it should fix almost all common cases. |
Input file: rapid-speech.mp4By FFmpeg: ffmpeg -i input.mp4 -vf "subtitles=input.srt:force_style='Alignment=2,MarginV=20'" \
-vcodec libx264 -profile:v main -preset:v medium -tune zerolatency -bf 0 \
-acodec aac -copyts -y output.mp4 Sometime, OpenAI whisper response with: 0
00:00:00,550 --> 00:00:15,839
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone sales in China compared it to a The result is bellow: output-1subtitle.mp4Sometimes, it responses: 0
00:00:00,550 --> 00:00:06,629
For today's tech check. So tell us about the details of this report. I know Huawei is obviously a very big competitor. Yeah. And that's
1
00:00:07,350 --> 00:00:13,829
small but growing. Let's get it that way. But the headline here is counterpart research looked at the first six weeks of smartphone
2
00:00:13,829 --> 00:00:15,789
sales in China compared it to. The result is bellow: output-3subtitles.mp4In most situations, OpenAI Whisper will generate multiple subtitles. If it doesn't, we might have to create them ourselves, which could be risky due to the potential for introducing bugs. Therefore, I would avoid doing this unless absolutely necessary. |
Dwayne:
Winlin:
This is not a bug in FFmpeg, but rather, the issue arises because Whisp recognized too many words and did not distribute them evenly throughout the timeline, causing them to accumulate all at once.
Reproduce this issue by this video: https://youtu.be/NONRDS7Rpjg
A 15 segment to reproduce this issue:
rapid-speech.mp4
This type of interview program is quite common, where multiple people speaking without pauses can lead to the AI recognizing the voice as continuously speaking for over ten seconds.
The text was updated successfully, but these errors were encountered: