Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtitles & captions #62

Open
jminor opened this issue Jan 13, 2017 · 26 comments · May be fixed by #805
Open

Subtitles & captions #62

jminor opened this issue Jan 13, 2017 · 26 comments · May be fixed by #805
Labels
help wanted We're looking for help from the community - you're weclome to volunteer! roadmap
Milestone

Comments

@jminor
Copy link
Collaborator

jminor commented Jan 13, 2017

We should support subtitles and captions.

@jminor jminor added the help wanted We're looking for help from the community - you're weclome to volunteer! label Jun 13, 2017
@jminor
Copy link
Collaborator Author

jminor commented Aug 8, 2017

There are several existing formats for subtitles that could be supported via adapters.
WebVTT: https://w3c.github.io/webvtt/
SSA: https://wiki.multimedia.cx/index.php?title=SubStation_Alpha
SRT, etc.

FCP XML supports <title> elements.
AAF probably supports titles too. Someone will need to research this...

Looking at how those formats represent titles, we should be able to model them as a subclass of Marker. Formatting information will likely be hard to normalize across all the formats, but in the spirit of OTIO, we can focus on the content and timing of the titles as that is likely to be the most useful part in practice.

The use cases for subtitles in OTIO seems to relate to swapping in/out localized subtitles, and/or passing subtitles up or downstream in a production pipeline.

@jminor jminor added this to the 1.0 Release milestone Sep 2, 2017
@jminor
Copy link
Collaborator Author

jminor commented Sep 27, 2017

Here is some info about how AAF encodes video titles: https://www.amwa.tv/downloads/specifications/AS-05_AAF_Effects_protocol_v1.pdf

I'm not sure if this is the same as a subtitle track. This still needs more research.

@jminor
Copy link
Collaborator Author

jminor commented Apr 20, 2018

Here are some more links for reference, though the license for some of these may not be compatible.
http://www.nikse.dk/SubtitleEdit/
https://github.com/ubershmekel/pytitle
https://github.com/pyahmed/sub2xml

@KarthikRIyer
Copy link
Contributor

KarthikRIyer commented Sep 17, 2020

I'd like to attempt this. But I'm still trying to understand how different subtitle formats work.

http://www.nikse.dk/SubtitleEdit/ this seems helpful. It has a common representation for subtitles: https://github.com/SubtitleEdit/subtitleedit/blob/master/libse/Subtitle.cs

Will the license be a problem if we refer to this implementation (because we won't be lifting code directly)?

@Jameclarke
Copy link

Here is a great article referencing how subtitles are used.
https://jonnyelwyn.co.uk/film-and-video-editing/adding-captions-in-premiere-pro-fix-common-problems/

@apetrynet
Copy link
Contributor

Some of the subtitle formats include positioning so I think this ties into the coordinate system and annotations as well.
#773 #771

@KarthikRIyer
Copy link
Contributor

What are annotations?

Are the positions as coordinates or something like align: left align: center? I saw the alignment approach in webvtt I think. If we use that approach we could leave it up to the adapter or the final consumer of the subtitles to decide the actual position, can't we?

@apetrynet
Copy link
Contributor

Annotations can be drawings, text etc overlaid the video. There's been discussions on how to implement this and that got tied to the position coordinate system.
Yes, those aligns are the ones I was thinking about. It could be up to the adapters, but if we want to have a common ground schema for several subtitle formats it's worth considering to have a way of storing positions.
I don't think waiting for a coordinate system should hold you back from working on this. We can always revisit positioning at a later stage.
Just my two cents (since I looked at this issue my self a few weeks ago :)

@meshula
Copy link
Collaborator

meshula commented Sep 17, 2020

@KarthikRIyer Since the license you reference is GPL, one should not reference that code, nor transliterate it.

@reinecke
Copy link
Collaborator

Hey @KarthikRIyer, check out TTML2 to get a feeling for what can be expressed in timed text: https://www.w3.org/TR/ttml2/

The Netflix Tech blog has some articles about the complexities of timed text (captions and subtitles). Here are a couple of them:

Hopefully that gives a good jumping off point.

@KarthikRIyer
Copy link
Contributor

Thanks @reinecke ! I'll go through these links.

@KarthikRIyer
Copy link
Contributor

Here's what I thought we could start with, based on what I understood from TTML2 and with how I've used captions in Premiere Pro. Please let me know any suggestions/changes/improvements.

subs

  • I think storing a Style object with each TimedText is wasteful, but decided to have this approach because we don't have id system in otio yet. In TTML2 there's a section to specify all styles and then refer then in each TimedText. With the current approach conversion from otio to say TTML2 won't be so direct because we don't have a single set of styles. One other thing I thought of was to have a list of styles inside Subtitle and then have a string id in each TimedText. But then each TimedText object will have no style meaning independently/outside the Subtitle. Thoughts?

  • What are Markers used for? Each marker has a markedRange. When we trim an Item or change the sourceRange shouldn't the markedRange of the markers at the start or end of the Item also change? I couldn't find anything that does something like this. Or is this handled in another way? Why I'm asking is if we trim the Subtitle or change its sourceRange, the ranges of TimedTexts at the start or end might change.

@KarthikRIyer KarthikRIyer linked a pull request Sep 24, 2020 that will close this issue
@meshula
Copy link
Collaborator

meshula commented Mar 26, 2021

Referring to styles is pretty conventional across the board, think CSS. If style is stored per caption, to accommodate the way real systems work, a consumer of a TimedText object would have to perform an initial pass to gather all the styles, and de-duplicate them into a dictionary. It seems to me that an internal referencing scheme for OTIO would be a prerequisite to implementing TimedText in a useful way. As a suggestion, such a referencing scheme could be as simple as saying that there is a concept of named dictionaries on the root object, and the style field in a TimedText refers to an object at e.g. @dict/styles/Japanese_bold. Introducing a root dictionary would open a can of worms of questionable usages, so I think we'd have to be very cautious about unintended consequences. At the same time, style per TimedText seems unworkable IMO.

@KarthikRIyer
Copy link
Contributor

The root dictionary suggestion seems similar to :
One other thing I thought of was to have a list of styles inside Subtitle and then have a string id in each TimedText, right?

I agree that style per TimedText isn't workable. What could be the issues with a root dictionary? If we have something like map<string/long, TimedTextStyle>, would that still have questionable usages?

@meshula
Copy link
Collaborator

meshula commented May 16, 2021

Perhaps it would make sense to store a map of named styles to style templates as a metadata entry on the Timeline object., and that storing that dictionary as a metadata anywhere else would be ignored.

@KarthikRIyer
Copy link
Contributor

Yeah, I can try this out

@KarthikRIyer
Copy link
Contributor

Note: Sample SRT files with formatting/styles for testing here

@KarthikRIyer
Copy link
Contributor

I was looking into parsing styles from SRT files. Specifically this sample.

SRT text is formatted using HTML. For one TimedText, it could be like this:

This should be an E with an accent: È
日本語
<font size=30><b><i><u>This text should be bold, italics and underline</u></i></b></font>
<font size=9 color="00ff00">This text should be small and green</font>
<font color=#ff0000 size=9>This text should be small and red</font>
<font color=brown size=24>This text should be big and brown</font>

So I think I'll need to make some changes to the classes I defined earlier. The current TimedText class has one content string, and one linked style object. I was thinking, It could have an array of content strings, each linked to an optional UID. Each UID would correspond to a style in a map stored as a metadata entry on the Timeline object.

For starters I think handling well defined HTML tags should be ok? There are many cases in the above linked sample file that can be handled, but would require some effort. Like,

>
It would be a good thing to
<invalid_tag>hide invalid html tags that are closed and show the text in them</invalid_tag>
<invalid_tag_unclosed>but show un-closed invalid html tags
Show not opened tags</invalid_tag_not_opened>
<
<font color="#00FF00" size="6">This could be the <font size="35">m<font color="#000000">o</font>st</font> difficult thing to implement</font>

@Laurian
Copy link

Laurian commented Jul 22, 2021

I'm interested in reading titles and captions out of FCPX (and maybe simple effects like generators), I could try adding them to the adapter but I don't know how they should be represented in OTIO?

@KarthikRIyer
Copy link
Contributor

@Laurian There's a WIP PR (#805) that adds a representation in OTIO. I was working on an SRT adapter, but there's work left to be done on the OTIO representation to support styles.

@Laurian
Copy link

Laurian commented Jul 22, 2021

Cool, I'll use your TimedText.1 schema to represent things and look into how to do the import/export on the fcp_xml.py adapter for it, as I'm quite familiar with the FCPX format; I can map to that both <title> and <caption> elements, even when <title> points to a generator, the text data is in there and I guess I can "hide" all the other extra bits in the metadata field for now (is there a policy/format on how to pass adapter specific metadata into the timeline?)

@meshula
Copy link
Collaborator

meshula commented Jul 23, 2021

I'm curious what the other metadata from FCP contains?

@jminor
Copy link
Collaborator Author

jminor commented Jul 23, 2021

@Laurian yes, the general guidance is that an adapter should translate to/from the OTIO schema (in this case @KarthikRIyer 's proposed TimedText schema) and then to put anything else interesting into metadata. Specifically nested into a sub-dictionary within metadata that is clearly labelled. This makes that metadata visible and invites discussion about what else could/should be promoted into the official schema.
You can see more guidance here: https://opentimelineio.readthedocs.io/en/latest/tutorials/otio-file-format-specification.html?highlight=metadata#metadata
and here: https://opentimelineio.readthedocs.io/en/latest/tutorials/write-an-adapter.html?highlight=metadata#metadata

@meshula
Copy link
Collaborator

meshula commented Jul 25, 2021

This project

https://github.com/naomiaro/waveform-playlist

led me to this -

https://github.com/readbeyond/aeneas

aeneas looks like a gold mine of reference material.

@Laurian
Copy link

Laurian commented Jul 29, 2021

@meshula FCPX will have for subtitles placement and styling metadata:

<caption name="People assume that time is a strict progression of cause to effect," lane="1" offset="15024/300s" duration="8600/2500s" start="3600s" role="iTT?captionFormat=ITT.en-GB">
  <text placement="top">
    <text-style ref="ts2">People assume that time is a strict progression of cause to effect,</text-style>
  </text>
  <text-style-def id="ts2">
    <text-style font=".AppleSystemUIFont" fontSize="13.01" fontFace="Regular" fontColor="1 0.999974 0.999991 1" backgroundColor="0 0 0 1"/>
  </text-style-def>
</caption>

But other titles you can add can be similar

<title name="Continuous" lane="2" offset="123500/2500s" ref="r6" duration="20100/2500s" start="3600s">
  <text>
    <text-style ref="ts1">Title</text-style>
  </text>
  <text-style-def id="ts1">
    <text-style font="Helvetica" fontSize="72" fontFace="Regular" fontColor="1 0.999974 0.999991 1" strokeColor="0.985948 0 0.0269506 0" strokeWidth="1" alignment="center"/>
  </text-style-def>
</title>

where ref="r6" points to the Apple Motion effect <effect id="r6" name="Continuous" uid=".../Titles.localized/Build In:Out.localized/Continuous.localized/Continuous.moti"/>

Similarly custom titles can have a lot of parameters (here's a BBC News caption one):

<title name="People assume that time is a strict progression of cause to effect, - 02 Subtitle" lane="1" offset="125200/2500s" ref="r7" duration="13209600/3840000s" start="3600s">
    <param name="Layout Method" key="9999/10201/3000298778/10202/2/314" value="1 (Paragraph)"/>
    <param name="Left Margin" key="9999/10201/3000298778/10202/2/323" value="0"/>
    <param name="Right Margin" key="9999/10201/3000298778/10202/2/324" value="0"/>
    <param name="Top Margin" key="9999/10201/3000298778/10202/2/325" value="0"/>
    <param name="Bottom Margin" key="9999/10201/3000298778/10202/2/326" value="-540"/>
    <param name="Alignment" key="9999/10201/3000298778/10202/2/354/10038/401" value="1 (Center)"/>
    <param name="Line Spacing" key="9999/10201/3000298778/10202/2/354/10038/404" value="-14"/>
    <param name="Alignment" key="9999/10201/3000298778/10202/2/373" value="0 (Left) 2 (Bottom)"/>
    <param name="Source Object" key="9999/10201/3000298778/10202/4/3000449521/201" value="3000449347"/>
    <param name="Scale" key="9999/10201/3000298778/10202/4/3000449521/204" value="-0.0925926"/>
    <param name="Apply Mode" key="9999/10201/3000298778/10202/4/3000450074/200" value="1 (Multiply by source)"/>
    <param name="Source Object" key="9999/10201/3000298778/10202/4/3000450074/201" value="3000450130"/>
    <param name="Scale" key="9999/10201/3000298778/10202/4/3000450074/204" value="10"/>
    <param name="Opacity" key="9999/10201/3000298778/10202/4/3001050732/1000/1044" value="0"/>
    <param name="Speed" key="9999/10201/3000298778/10202/4/3001050732/201/208" value="6 (Custom)"/>
    <param name="Custom Speed" key="9999/10201/3000298778/10202/4/3001050732/201/209">
        <keyframeAnimation>
            <keyframe time="0s" value="0"/>
            <keyframe time="454656/153600s" value="0"/>
        </keyframeAnimation>
    </param>
    <param name="Range" key="9999/10201/3000298778/10202/4/3001050732/201/229/230" value="6 (Line)"/>
    <param name="End Index" key="9999/10201/3000298778/10202/4/3001050732/201/229/232" value="3"/>
    <param name="Invert" key="9999/10201/3000298778/10202/4/3001050732/201/229/233" value="1"/>
    <text>
        <text-style ref="ts2">People assume that time is a strict progression of cause to effect,</text-style>
    </text>
    <text-style-def id="ts2">
        <text-style font="BBC Reith Sans" fontSize="58" fontFace="Regular" fontColor="1 0.999974 0.999991 1" alignment="center" lineSpacing="-14"/>
    </text-style-def>
</title>

again the ref="r7" will point to the actual Apple Motion file <effect id="r7" name="02 Subtitle" uid="~/Titles.localized/BBC News/B Info/02 Subtitle/02 Subtitle.moti" src="file:///Users/laurian/Movies/Motion%20Templates.localized/Titles.localized/BBC%20News/B%20%20Info/02%20Subtitle/02%20Subtitle.moti"/>

So I would try to preserve that in metadata just in case I need it.

@meshula
Copy link
Collaborator

meshula commented Jul 29, 2021

Thanks ~ what prompted my question was wondering whether the extra data fell in the category of "extra non-portable stuff that should be preserved" or the category of "falls into the styling category". The examples seem to show styling/layout plus some animation parameters.

@jminor jminor added the roadmap label Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted We're looking for help from the community - you're weclome to volunteer! roadmap
Projects
Development

Successfully merging a pull request may close this issue.

7 participants