Better handling of label splitting #2139

The3IC · 2024-05-10T13:15:40Z

The3IC
May 10, 2024

Enjoying the use of OpenVino/whisper to generate caption for videos with spoken language not supported by Davinci Resolve or when I need to also create a translation into English.

While the speech to text works very nicely in terms of correctly interpreting the spoken words, I think the handling of the resulting labels could be a bit improved. I was instructed at the OpenVino github that the below issues are related to whisper and not OpenVivo.

I'm seeing two things that could do with some innovation love:

Orphan words at the end of a sentence resulting in single word labels. This happens when a sentence is (say) 70 characters long and the max segment size has been set to 65. See example below:

It would be good if in the case, the sentence would be split into "and does it actually have then a significant impact" and "on the quality". Orphan words tend to be quite hard to read as the word "wizzes by" in the video quite quickly.
This could be handled with a parameter like:
end-of-sentence length (words): 3

There will ofcourse be edge cases still where 1 word labels will be created (like long pauses).

New line based on "in-word" punctuation. See one example below:

Other examples are "it's" split on the apostrophe, numbers like "2.2" split on the full stop. In all these cases the "full word" should be kept intact and not split between 2 labels.

(The screenshot also illustrates a situation where it would make sense to split the label with the last sentence two words ("So I"), and move those to the next line/label, a variation of case 1) but less important. )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of label splitting #2139

{{title}}

Replies: 0 comments

Select a reply

Better handling of label splitting #2139

The3IC May 10, 2024

Replies: 0 comments

The3IC
May 10, 2024