You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
... so that the time intervals of both the entire subtitle and each word inside are indicated.
I tried everything I could in SE (switches --highlight_words True, -f=lrc, --world_timestamp True), read the manuals - nothing works.
Please teach me or add such a useful feature.
Information about the timestamp of each word is stored in the "JSON Type 6" subtitles, but in it the calculation of the start time of words is not actual, but based on the number of characters from the beginning of the line, that is, it is incorrect. And information is needed about the actual moments of the beginning of words, which can be provided by some models in raw output in JSON, for example, Whisper.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I need to get subtitles like Enhanced LRC (A2 extension): word time tag https://en.wikipedia.org/wiki/LRC_(file_format)#A2_extension:_word_time_tag
It can be in the form of json as in the example https://whisperapi.com/vtt-srt-for-videos-using-python#:~:text=Speech%2Dto%2DText%20API%20Output%20Format
... so that the time intervals of both the entire subtitle and each word inside are indicated.
I tried everything I could in SE (switches --highlight_words True, -f=lrc, --world_timestamp True), read the manuals - nothing works.
Please teach me or add such a useful feature.
Information about the timestamp of each word is stored in the "JSON Type 6" subtitles, but in it the calculation of the start time of words is not actual, but based on the number of characters from the beginning of the line, that is, it is incorrect. And information is needed about the actual moments of the beginning of words, which can be provided by some models in raw output in JSON, for example, Whisper.
Beta Was this translation helpful? Give feedback.
All reactions