Replace mux.js for CEA parsing #2648

joeyparrish · 2020-06-11T19:02:45Z

Have you read the FAQ and checked for duplicate open issues?
Yes

Is your feature request related to a problem? Please describe.
Using mux.js for transmuxing makes sense, as we do not want to support TS streams long-term. Using mux.js for CEA captions is different, since broadcast TV caption standards are not going away any time soon.

Describe the solution you'd like
We should own our own CEA parser so that apps do not need an additional dependency for caption parsing. We can keep mux.js as an optional dependency for transmuxing TS content.

Describe alternatives you've considered

Continue using mux.js for captions (no change from today)
Use another existing parser (still a third-party dep, but now multiple)

…ject#2648)

Issue #2648

…oject#2648)

shaka-project#2648)

During video playback, if the user switches the caption stream (e.g. CC1 to CC3 which changes languages), the first caption in the next fragment is missing. In fragmented MP4s, the end time of a caption is determined by the start time of the next caption. Thus for the last caption in a fragment, the end time cannot be determined until the next fragment is parsed. Before this fix: the clearing of the caption stream was being called from a chain of function calls originating from clearBuffer_() on the Media source engine. But clearing a buffer and resetting a caption stream are two independent operations. As a result, the caption parser was being reset (its buffer cleared) during video seeks, and during stream switches. This makes sense for video seeks, because the end time of the last caption in the fragment can't be determined if the entire presentation timestamp changes. However for stream switches, resetting the parser doesn't make sense. Clearing the caption parser during a stream switch would actually get rid of the last caption in that fragment (which wasn't emitted since its end time wasn't determined yet), and we would lose the data, causing the problem. The fix is to reset (and hence clear) the caption parser during seeks, but not during stream switches. Issue #2648

This is an MP4 Parser which extracts CEA-708 packets from Fragmented MP4 streams. The Closed Caption Parser (shaka.media.ClosedCaptionParser) will own this MP4 Parser, and will initialize it and call it as shown. As data comes in, the parser will parse this data, and the caption packets data then be returned in a callback (on708Data), as shown. Here, a theoretical decoder (future pull request, mentioned as a Todo comment) will decode and extract the parsed captions from these packets. Issue #2648

…ject#2648)

…oject#2648)

…ject#2648) added tkhd box parsing unit testing for mp4 box parsers moved mp4 box structs to lib/util inlined returns for mp4 boxes created mp4 parser to parse cea708 packets from mp4 streams (shaka-project#2648) tightened array out of bound checks added error code for invalid mp4 for cea packets refactored name of mp4 cea parser interface fixed a bug with increment ordering that affected time linting addressed mp4 cea parsing comments stylistic changes to mp4 cea parser improved mp4 parser comments and addressed pr review return caption packets as array instead of in a callback removed caption packets from class state to avoid clearing it in the middle of a parse

…ject#2648) added tkhd box parsing unit testing for mp4 box parsers moved mp4 box structs to lib/util inlined returns for mp4 boxes created mp4 parser to parse cea708 packets from mp4 streams (shaka-project#2648) tightened array out of bound checks added error code for invalid mp4 for cea packets refactored name of mp4 cea parser interface fixed a bug with increment ordering that affected time linting addressed mp4 cea parsing comments stylistic changes to mp4 cea parser improved mp4 parser comments and addressed pr review return caption packets as array instead of in a callback removed caption packets from class state to avoid clearing it in the middle of a parse Implemented CEA-608 decoder fix the diff fix build and style fixing removed new parser bug fix for fields

removed useless logs Added color support fixed up background attribute logic trim new lines on row ends and styling fix logic on clearing rollup captions when moving window hotfixes on decoder and tests fixed logic and added support for all charsets minor syntax fix renaming constants improve commenting, remove redundant comments move important hex into constants broke hex values in unit tests into constants comment fixing constant suffixes cleaned impl for streams/channel

avelad · 2020-07-15T07:08:03Z

@muhammadharis Do you plan to implement this for TS too? I ask this to know if the forceTransmuxTS configuration can be removed

https://shaka-player-demo.appspot.com/docs/api/shaka.extern.html#.StreamingConfiguration

If this is true, we will transmux TS content even if not strictly necessary for the assets to be played. Shaka Player currently only supports CEA 708 captions by transmuxing, so this value is necessary for enabling them on platforms with native TS support like Edge or Chromecast. This value defaults to false.

muhammadharis · 2020-07-15T16:36:40Z

@muhammadharis Do you plan to implement this for TS too? I ask this to know if the forceTransmuxTS configuration can be removed

@avelad Currently, we do not plan to implement this for Transport Streams, only for MP4. As Joey mentioned in this issue, we don't plan to support TS streams long-run so we will continue using Mux.js for transmuxing. However, the cea/i_cea_parser.js interface is modular, and allows us to drop in a TS parser if we change our mind about this in the future.

During video playback, if the user switches the caption stream (e.g. CC1 to CC3 which changes languages), the first caption in the next fragment is missing. In fragmented MP4s, the end time of a caption is determined by the start time of the next caption. Thus for the last caption in a fragment, the end time cannot be determined until the next fragment is parsed. Before this fix: the clearing of the caption stream was being called from a chain of function calls originating from clearBuffer_() on the Media source engine. But clearing a buffer and resetting a caption stream are two independent operations. As a result, the caption parser was being reset (its buffer cleared) during video seeks, and during stream switches. This makes sense for video seeks, because the end time of the last caption in the fragment can't be determined if the entire presentation timestamp changes. However for stream switches, resetting the parser doesn't make sense. Clearing the caption parser during a stream switch would actually get rid of the last caption in that fragment (which wasn't emitted since its end time wasn't determined yet), and we would lose the data, causing the problem. The fix is to reset (and hence clear) the caption parser during seeks, but not during stream switches. Issue #2648

…project#2648)

Related to #2648, #2357, and #2776

This pertains to #2648 (although this is a new feature, not a replacement) and #1404. A CEA-708 decoder that follows the CEA-708-E standard, decodes closed caption data from User Data Registered by Rec. ITU-T T.35 SEI messages, and returns them as cues in Shaka's internal cue format. Furthermore, this pull request fixes and cements some of the logic surrounding CEA-608 and CEA-708 tag parsing on the Dash Manifest Parser. Format: Similar to the CEA-608 decoder, cues are emitted in Shaka's internal format (lib/text/cue.js). This decoder makes use of nested cues. The top level cue is always a blank cue with no text, and each nested cue inside it contains text, as well as a specific style, or linebreak cues to facilitate line breaks. This also allows for inline style (color, italics, underline) changes. Details: - ASCII (G0), Latin-1 (G1), and CEA-708 specific charsets (G2 and G3) all supported. - Underlines, colors, and Italics supported, set as a property on each nested cue. - Positioning of text is supported. (Exception: In CEA-708 the default positioning is left, in this decoder it is centered.) - Positioning of windows not supported, but relevant fields that could be used to support this are extracted and left as a TODO.

This pertains to shaka-project#2648 (although this is a new feature, not a replacement) and shaka-project#1404. A CEA-708 decoder that follows the CEA-708-E standard, decodes closed caption data from User Data Registered by Rec. ITU-T T.35 SEI messages, and returns them as cues in Shaka's internal cue format. Furthermore, this pull request fixes and cements some of the logic surrounding CEA-608 and CEA-708 tag parsing on the Dash Manifest Parser. Format: Similar to the CEA-608 decoder, cues are emitted in Shaka's internal format (lib/text/cue.js). This decoder makes use of nested cues. The top level cue is always a blank cue with no text, and each nested cue inside it contains text, as well as a specific style, or linebreak cues to facilitate line breaks. This also allows for inline style (color, italics, underline) changes. Details: - ASCII (G0), Latin-1 (G1), and CEA-708 specific charsets (G2 and G3) all supported. - Underlines, colors, and Italics supported, set as a property on each nested cue. - Positioning of text is supported. (Exception: In CEA-708 the default positioning is left, in this decoder it is centered.) - Positioning of windows not supported, but relevant fields that could be used to support this are extracted and left as a TODO.

joeyparrish added the type: enhancement New feature or request label Jun 11, 2020

shaka-bot added this to the Backlog milestone Jun 11, 2020

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Jun 11, 2020

Created MP4 box parsers to parse data for common box types (shaka-pro…

ffaed64

…ject#2648)

muhammadharis mentioned this issue Jun 11, 2020

Created MP4 Box parsers to parse data for common box types #2649

Merged

joeyparrish pushed a commit that referenced this issue Jun 16, 2020

Created MP4 Box parsers to parse data for common box types (#2649)

e8f24ec

Issue #2648

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Jun 17, 2020

created mp4 parser to parse cea708 packets from mp4 streams (shaka-pr…

8e8c51c

…oject#2648)

muhammadharis mentioned this issue Jun 17, 2020

Parse CEA-708 Packets from Fragmented MP4 streams. #2660

Merged

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Jun 21, 2020

fixed captions disappearing at certain times when streams are switched (

19eb40e

shaka-project#2648)

muhammadharis mentioned this issue Jun 22, 2020

Fix captions going missing when switching streams bug #2672

Merged

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Jul 9, 2020

Created MP4 box parsers to parse data for common box types (shaka-pro…

955ce09

…ject#2648)

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Jul 9, 2020

created mp4 parser to parse cea708 packets from mp4 streams (shaka-pr…

4ae9dbe

…oject#2648)

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Jul 9, 2020

Implemented CEA608 decoder (shaka-project#2648)

a791a53

muhammadharis mentioned this issue Jul 15, 2020

CEA-608 Decoder #2731

Merged

joeyparrish closed this as completed in 1c00b4c Aug 7, 2020

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Aug 10, 2020

CEA-708 Decoder (shaka-project#2648)

92da874

muhammadharis added a commit to muhammadharis/shaka-player that referenced this issue Aug 11, 2020

add support and unit tests for styling on SimpleTextDisplayer (shaka-…

2407734

…project#2648)

muhammadharis mentioned this issue Aug 11, 2020

Render bold/italics/underline on SimpleTextDisplayer #2779

Merged

joeyparrish pushed a commit that referenced this issue Aug 11, 2020

feat(text): Render bold/italics/underline on SimpleTextDisplayer (#2779)

91a284f

Related to #2648, #2357, and #2776

muhammadharis mentioned this issue Aug 18, 2020

CEA-708 Decoder (#2648) #2807

Merged

joeyparrish modified the milestones: Backlog, v3.1 Aug 21, 2020

shaka-bot added the archived label Oct 6, 2020

shaka-project locked and limited conversation to collaborators Oct 6, 2020

joeyparrish assigned muhammadharis Jan 14, 2021

shaka-bot added the status: archived Archived and locked; will not be updated label Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace mux.js for CEA parsing #2648

Replace mux.js for CEA parsing #2648

joeyparrish commented Jun 11, 2020

avelad commented Jul 15, 2020

muhammadharis commented Jul 15, 2020

Replace mux.js for CEA parsing #2648

Replace mux.js for CEA parsing #2648

Comments

joeyparrish commented Jun 11, 2020

avelad commented Jul 15, 2020

muhammadharis commented Jul 15, 2020