-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtitles & captions #62
Comments
There are several existing formats for subtitles that could be supported via adapters. FCP XML supports Looking at how those formats represent titles, we should be able to model them as a subclass of Marker. Formatting information will likely be hard to normalize across all the formats, but in the spirit of OTIO, we can focus on the content and timing of the titles as that is likely to be the most useful part in practice. The use cases for subtitles in OTIO seems to relate to swapping in/out localized subtitles, and/or passing subtitles up or downstream in a production pipeline. |
Here is some info about how AAF encodes video titles: https://www.amwa.tv/downloads/specifications/AS-05_AAF_Effects_protocol_v1.pdf I'm not sure if this is the same as a subtitle track. This still needs more research. |
Here are some more links for reference, though the license for some of these may not be compatible. |
I'd like to attempt this. But I'm still trying to understand how different subtitle formats work. http://www.nikse.dk/SubtitleEdit/ this seems helpful. It has a common representation for subtitles: https://github.com/SubtitleEdit/subtitleedit/blob/master/libse/Subtitle.cs Will the license be a problem if we refer to this implementation (because we won't be lifting code directly)? |
Here is a great article referencing how subtitles are used. |
What are annotations? Are the positions as coordinates or something like |
Annotations can be drawings, text etc overlaid the video. There's been discussions on how to implement this and that got tied to the position coordinate system. |
@KarthikRIyer Since the license you reference is GPL, one should not reference that code, nor transliterate it. |
Hey @KarthikRIyer, check out TTML2 to get a feeling for what can be expressed in timed text: https://www.w3.org/TR/ttml2/ The Netflix Tech blog has some articles about the complexities of timed text (captions and subtitles). Here are a couple of them:
Hopefully that gives a good jumping off point. |
Thanks @reinecke ! I'll go through these links. |
Here's what I thought we could start with, based on what I understood from TTML2 and with how I've used captions in Premiere Pro. Please let me know any suggestions/changes/improvements.
|
Referring to styles is pretty conventional across the board, think CSS. If style is stored per caption, to accommodate the way real systems work, a consumer of a TimedText object would have to perform an initial pass to gather all the styles, and de-duplicate them into a dictionary. It seems to me that an internal referencing scheme for OTIO would be a prerequisite to implementing TimedText in a useful way. As a suggestion, such a referencing scheme could be as simple as saying that there is a concept of named dictionaries on the root object, and the style field in a TimedText refers to an object at e.g. |
The root dictionary suggestion seems similar to : I agree that style per TimedText isn't workable. What could be the issues with a root dictionary? If we have something like |
Perhaps it would make sense to store a map of named styles to style templates as a metadata entry on the Timeline object., and that storing that dictionary as a metadata anywhere else would be ignored. |
Yeah, I can try this out |
Note: Sample SRT files with formatting/styles for testing here |
I was looking into parsing styles from SRT files. Specifically this sample. SRT text is formatted using HTML. For one TimedText, it could be like this: This should be an E with an accent: È
日本語
<font size=30><b><i><u>This text should be bold, italics and underline</u></i></b></font>
<font size=9 color="00ff00">This text should be small and green</font>
<font color=#ff0000 size=9>This text should be small and red</font>
<font color=brown size=24>This text should be big and brown</font> So I think I'll need to make some changes to the classes I defined earlier. The current For starters I think handling well defined HTML tags should be ok? There are many cases in the above linked sample file that can be handled, but would require some effort. Like, >
It would be a good thing to
<invalid_tag>hide invalid html tags that are closed and show the text in them</invalid_tag>
<invalid_tag_unclosed>but show un-closed invalid html tags
Show not opened tags</invalid_tag_not_opened>
< <font color="#00FF00" size="6">This could be the <font size="35">m<font color="#000000">o</font>st</font> difficult thing to implement</font> |
I'm interested in reading titles and captions out of FCPX (and maybe simple effects like generators), I could try adding them to the adapter but I don't know how they should be represented in OTIO? |
Cool, I'll use your |
I'm curious what the other metadata from FCP contains? |
@Laurian yes, the general guidance is that an adapter should translate to/from the OTIO schema (in this case @KarthikRIyer 's proposed TimedText schema) and then to put anything else interesting into metadata. Specifically nested into a sub-dictionary within metadata that is clearly labelled. This makes that metadata visible and invites discussion about what else could/should be promoted into the official schema. |
This project https://github.com/naomiaro/waveform-playlist led me to this - https://github.com/readbeyond/aeneas aeneas looks like a gold mine of reference material. |
@meshula FCPX will have for subtitles placement and styling metadata: <caption name="People assume that time is a strict progression of cause to effect," lane="1" offset="15024/300s" duration="8600/2500s" start="3600s" role="iTT?captionFormat=ITT.en-GB">
<text placement="top">
<text-style ref="ts2">People assume that time is a strict progression of cause to effect,</text-style>
</text>
<text-style-def id="ts2">
<text-style font=".AppleSystemUIFont" fontSize="13.01" fontFace="Regular" fontColor="1 0.999974 0.999991 1" backgroundColor="0 0 0 1"/>
</text-style-def>
</caption> But other titles you can add can be similar <title name="Continuous" lane="2" offset="123500/2500s" ref="r6" duration="20100/2500s" start="3600s">
<text>
<text-style ref="ts1">Title</text-style>
</text>
<text-style-def id="ts1">
<text-style font="Helvetica" fontSize="72" fontFace="Regular" fontColor="1 0.999974 0.999991 1" strokeColor="0.985948 0 0.0269506 0" strokeWidth="1" alignment="center"/>
</text-style-def>
</title> where Similarly custom titles can have a lot of parameters (here's a BBC News caption one): <title name="People assume that time is a strict progression of cause to effect, - 02 Subtitle" lane="1" offset="125200/2500s" ref="r7" duration="13209600/3840000s" start="3600s">
<param name="Layout Method" key="9999/10201/3000298778/10202/2/314" value="1 (Paragraph)"/>
<param name="Left Margin" key="9999/10201/3000298778/10202/2/323" value="0"/>
<param name="Right Margin" key="9999/10201/3000298778/10202/2/324" value="0"/>
<param name="Top Margin" key="9999/10201/3000298778/10202/2/325" value="0"/>
<param name="Bottom Margin" key="9999/10201/3000298778/10202/2/326" value="-540"/>
<param name="Alignment" key="9999/10201/3000298778/10202/2/354/10038/401" value="1 (Center)"/>
<param name="Line Spacing" key="9999/10201/3000298778/10202/2/354/10038/404" value="-14"/>
<param name="Alignment" key="9999/10201/3000298778/10202/2/373" value="0 (Left) 2 (Bottom)"/>
<param name="Source Object" key="9999/10201/3000298778/10202/4/3000449521/201" value="3000449347"/>
<param name="Scale" key="9999/10201/3000298778/10202/4/3000449521/204" value="-0.0925926"/>
<param name="Apply Mode" key="9999/10201/3000298778/10202/4/3000450074/200" value="1 (Multiply by source)"/>
<param name="Source Object" key="9999/10201/3000298778/10202/4/3000450074/201" value="3000450130"/>
<param name="Scale" key="9999/10201/3000298778/10202/4/3000450074/204" value="10"/>
<param name="Opacity" key="9999/10201/3000298778/10202/4/3001050732/1000/1044" value="0"/>
<param name="Speed" key="9999/10201/3000298778/10202/4/3001050732/201/208" value="6 (Custom)"/>
<param name="Custom Speed" key="9999/10201/3000298778/10202/4/3001050732/201/209">
<keyframeAnimation>
<keyframe time="0s" value="0"/>
<keyframe time="454656/153600s" value="0"/>
</keyframeAnimation>
</param>
<param name="Range" key="9999/10201/3000298778/10202/4/3001050732/201/229/230" value="6 (Line)"/>
<param name="End Index" key="9999/10201/3000298778/10202/4/3001050732/201/229/232" value="3"/>
<param name="Invert" key="9999/10201/3000298778/10202/4/3001050732/201/229/233" value="1"/>
<text>
<text-style ref="ts2">People assume that time is a strict progression of cause to effect,</text-style>
</text>
<text-style-def id="ts2">
<text-style font="BBC Reith Sans" fontSize="58" fontFace="Regular" fontColor="1 0.999974 0.999991 1" alignment="center" lineSpacing="-14"/>
</text-style-def>
</title> again the So I would try to preserve that in metadata just in case I need it. |
Thanks ~ what prompted my question was wondering whether the extra data fell in the category of "extra non-portable stuff that should be preserved" or the category of "falls into the styling category". The examples seem to show styling/layout plus some animation parameters. |
We should support subtitles and captions.
The text was updated successfully, but these errors were encountered: