Reference W3C Synchronized Narration #109

HadrienGardeur · 2019-12-23T10:50:47Z

HadrienGardeur
Dec 23, 2019
Maintainer

Instead of rolling out our own format for a media overlay equivalent, we should consider adopting the work being done within a CG at W3C around Synchronized Narration: https://w3c.github.io/sync-media-pub/synchronized-narration.html

A few notes regarding that document:

the approach taken by the W3C CG is influenced by our design
we don't really need a media-overlay property anymore, we can simply use alternate instead
the W3C approach is limited to a single HTML file per Synchronized Narration document

Any thoughts on this? cc @danielweck @llemeurfr

danielweck · 2019-12-23T11:41:38Z

danielweck
Dec 23, 2019
Maintainer

Given that Readium's "draft" proposal for its internal JSON representation of EPUB3 Media Overlays has not really been used in practice yet (i.e. just a parser, no playback engine): yes, I think it is timely to leverage the outcome of the W3C sync-media community group.

0 replies

llemeurfr · 2019-12-23T15:03:37Z

llemeurfr
Dec 23, 2019
Maintainer

I agree that this work is good and it seems we don't need to make anything "better" here. I'll take a look at the details again before committing to this.

0 replies

qnga · 2020-01-23T20:48:00Z

qnga
Jan 23, 2020

Personally, I think it is better to clearly identify media overlays, rather that using alternate as suggested by @HadrienGardeur and in Incorporating Synchronized Narration. Aren't alternate links more like fallback resources, as defined in Epub?
In the audiobook example, there would be no difference between media overlays and fallback links to audio in an alternate format.
Maybe rel attribute can explicit the difference...

0 replies

qnga · 2020-01-24T23:31:52Z

qnga
Jan 24, 2020

Pub manifest specification definitely prevents using alternate links for media overlays as it would break the algorithm for selecting alternate resource.
I can't understand why audiobooks suggest to use this dirty trick. Even in this case, media overlays aren't just a fallback, but a feature that provides the ability to move in the content, for example paragraph by paragraph.

HadrienGardeur
Jan 25, 2020
Maintainer Author

In the audiobook example, there would be no difference between media overlays and fallback links to audio in an alternate format.

There would be a difference since they wouldn't use the same media type. They could also potentially use a different rel as well.

Aren't alternate links more like fallback resources, as defined in Epub?

They're not a fallback in the EPUB sense. A User Agent would compare the primary resource with its alternates and decide which one it should use.
A fallback would behave differently: the UA would only consider the fallback if it can't support the primary resource.

0 replies

llemeurfr · 2020-01-25T17:38:28Z

llemeurfr
Jan 25, 2020
Maintainer

I agree that a pub manifest "alternate" is broader than what EPUB fallbacks allow. It will be used to map EPUB fallbacks ("here is an Opus audio, and here is an alternate mp3 if the reading system cannot handle Opus") but it can also be used to select a sound quality ("here is a 320kbps audio, and here is an alternate 128kbps if bandwidth is an issue"). The alternate feature has been quite tricked to handle the sync-narration feature, with statements like "here is an HTML page, and here is its alternate sync-narration". The sync narration is more "complementary" or "additional" than "alternate" indeed.

Which makes me wonder if replacing the current media-overlay property by an alternate + rel=sync-narration is the good solution for Readium.

Pub manifest specification definitely prevents using alternate links for media overlays as it would break the algorithm for selecting alternate resource.
I don't feel so. "if alternate["encodingFormat"] is set and the user agent supports the specified media type" + "return a resource from possibleAlternates as determined by the user agent." covers the case where the sync. narration is supported IMO.

0 replies

llemeurfr · 2020-01-25T17:44:11Z

llemeurfr
Jan 25, 2020
Maintainer

By the way, if we decide to keep a specific property, syncNarration may be better than 'media-overlay/mediaOverlay`, to reflect the W3C format we'll use.

0 replies

llemeurfr · 2020-01-25T17:55:07Z

llemeurfr
Jan 25, 2020
Maintainer

Another aspect: reading the sample in sync narration spec and looking at sync narration in pub manifest, I see that the text property is "Value is a URL "fragment" which is typically a unique identifier that references a document element" => it is relative to the HTML page the sync narration is an alternative of.

1/ this is not how we should have an "alternate" resource specified.
2/ It does not fit with this comment from Hadrien.

@danielweck, I'd like to discuss this with you an Marisa, maybe open an issue in the sync-media space

0 replies

danielweck · 2020-01-25T18:56:07Z

danielweck
Jan 25, 2020
Maintainer

Indeed!

    {
      "text": "#id1",
      "audio": "audio.mp3#t=0.0,1.2"
    },

Thanks for bringing this to my attention, Laurent. The proposed processing model seems to rely on the base URL of the JSON resource being identical to that of the associated HTML document, for the purpose of resolving 'text' URLs. I think this is an incorrect approach (instead, 'text' URLs should be just like 'audio' URLs). If I remember correctly, I wasn't involved in the discussions that led to this design decision, but I can only blame myself for not participating more actively in the recent times. Marisa actually did the bulk of the re-work of the original draft specification. Credits to her (and many thanks) for editing/advancing the sync-media "specification" (Community Group), but I would indeed like to raise the issue about misuse of the JSON's base URL (in my opinion).

0 replies

danielweck · 2020-01-25T19:05:26Z

danielweck
Jan 25, 2020
Maintainer

I posted an issue: w3c/sync-media-pub#28
Related issue:
w3c/sync-media-pub#26

0 replies

llemeurfr · 2020-01-25T19:09:41Z

llemeurfr
Jan 25, 2020
Maintainer

To be considered as a proper "alternate" rendition of the content, the json narration must be independent of the resource it is an alternate for. If the spec wording if modified to mandate valid URL strings, I think the situation will be good.

0 replies

HadrienGardeur · 2020-01-25T23:21:42Z

HadrienGardeur
Jan 25, 2020
Maintainer Author

The sync narration is more "complementary" or "additional" than "alternate" indeed.

I don't agree with that statement.

I think that this is consistent with other use cases, where alternate contains the primary resource that needs to be fetched to instantiate your navigator(s) properly.

Let's imagine a publication where each resource in readingOrder is HTML, but also provides a Synchronized Narration document and an MP3 in alternate:

if an app supports Synchronized Narration, it can fetch the Synchronized Narration Document as its preferred resource and use two navigators (HTML and audio) to handle things properly
if an app only supports audio, it can fetch the MP3 and provide an audio only experience
if an app only supports HTML, it would simply ignore the resources listed in alternate and go through the readingOrder as usual

This could also be a user preferences of course.

I don't see much of a difference between fetching an HTML document as a primary resource and a Synchronized Narration document as a primary resource.

When rendering HTML, you need to fetch secondary resources: images, CSS, JS, fonts, audio and video.
This is no different with a Synchronized Narration document, but this time the secondary resources are slightly different: HTML and audio.

Which makes me wonder if replacing the current media-overlay property by an alternate + rel=sync-narration is the good solution for Readium.

We created the media-overlay property at the very beginning of the project, when we didn't had alternate.
As explained above, I don't think that Media Overlays/Synchronized Narration are any different than the other use cases for alternate. In general, I'm against the proliferation of properties and roles if we can cover their use cases across a more abstract/generic term.
I definitely think that's the case here and having type + rel in alternate are more than enough to identify that a given resource is a Synchronized Narration document (HATEOAS FTW)

0 replies

qnga · 2020-01-26T13:02:18Z

qnga
Jan 26, 2020

Ok, that's now clearer for me. With your approach, one can imagine a case where there are alternate audio links with various bitrates or codecs and multiple sync narr pointing each to a specific audio file. However, keeping the selection algorithm simple and preventing the necessity of inspecting each sync narr resource would require to cumulate in sync narr links properties of both text file and audio file, for example the bitrate and codec of the audio.

0 replies

llemeurfr · 2020-01-26T14:42:47Z

llemeurfr
Jan 26, 2020
Maintainer

My comment was related to the way sync narrations are defined today by the W3C WG, i.e. with textual parts specified as URL fragment, dependent on the "main" HTML resource. Since this will certainly be corrected in the W3C spec and the textual part will become a URL string (absolute URL or relative to the sync narr json origin), I'm ok to consider sync narr as proper alternate renditions, and therefore remove the media-overlay property from our model.

So, let's try a sample, with html pages and a sync narration based on the same html + audio, stored in an LPF file (-> relative URLs).

In the manifest we would find:

{
  "href": "text/chapter1.html",
  "type": "text/html",
  "alternate": [
    {  "href": "syncnarr1.json",
       "type": "application/vnd.syncnarr+json",
       "rel": "sync-narration"
    }
   ]
}

and syncnarr1.json containing:

{
  "role": "chapter1",
  "narration": [
    {
      "text": "text/chapter1.html#id1",
      "audio": "audio/voice1.mp3#t=0.0,1.2"
    },
    {
      "text": "text/chapter1.html#id2",
      "audio": "audio/voice1.mp3#t=1.2,3.4"
    }
   ]
}

With such a model, we can also have a slightly different html page in the sync narration, e.g. with a simplified structure, no footnotes, simpler tables ... anything that could make the narration smoother.

0 replies

HadrienGardeur · 2020-01-27T11:35:26Z

HadrienGardeur
Jan 27, 2020
Maintainer Author

However, keeping the selection algorithm simple and preventing the necessity of inspecting each sync narr resource would require to cumulate in sync narr links properties of both text file and audio file, for example the bitrate and codec of the audio.

IMO the selection should be purely based on the Link Objects and shouldn't require fetching the Synchronized Narration document or related resources.

Right now, this means that we could select between multiple Synchronized Narration documents based on language, but not based on audio format (this info isn't available in the document either).

0 replies

marisademeglio · 2020-01-29T18:51:34Z

marisademeglio
Jan 29, 2020

Indeed!
    {
      "text": "#id1",
      "audio": "audio.mp3#t=0.0,1.2"
    },
Thanks for bringing this to my attention, Laurent. The proposed processing model seems to rely on the base URL of the JSON resource being identical to that of the associated HTML document, for the purpose of resolving 'text' URLs. I think this is an incorrect approach (instead, 'text' URLs should be just like 'audio' URLs). If I remember correctly, I wasn't involved in the discussions that led to this design decision, but I can only blame myself for not participating more actively in the recent times. Marisa actually did the bulk of the re-work of the original draft specification. Credits to her (and many thanks) for editing/advancing the sync-media "specification" (Community Group), but I would indeed like to raise the issue about misuse of the JSON's base URL (in my opinion).

I have just a few comments:

The original text fragment syntax came from this draft, contributed by @danielweck :
https://github.com/w3c/sync-media-pub/blob/267ef4b44ddb49789196755a08f71ba87ed88751/web-proposal.md#the-sync-media-json-format
I am in the process of refining this. There are two open issues:

As well as a proposed solution:

Add root-level properties w3c/sync-media-pub#29

Nothing is set in stone yet. As implementers, your comments are most welcome.

The conceptual difference between text and audio that relates to how they are referenced is that a sync narration document goes with exactly one HTML file, as @HadrienGardeur pointed out above. So, repeating the filename in each sync point reference gets verbose. There is not a similar restriction on audio.
An aside, regarding UA selection of alternates, you can see our discussion here:
choosing alternatives w3c/pub-manifest#133

0 replies

HadrienGardeur · 2020-01-30T11:06:44Z

HadrienGardeur
Jan 30, 2020
Maintainer Author

The conceptual difference between text and audio that relates to how they are referenced is that a sync narration document goes with exactly one HTML file, as @HadrienGardeur pointed out above. So, repeating the filename in each sync point reference gets verbose. There is not a similar restriction on audio.

@marisademeglio Is there any specific reason for that?

Historically (from an EPUB perspective) I can understand that position, but in the case of audiobooks where audio files are the primary resource, it would also make sense that a single audio resource could reference multiple HTML resources.

Audiobooks are often produced with a single track for the whole publication, or with audio resources that cover multiple chapters.

0 replies

marisademeglio · 2020-01-30T19:52:25Z

marisademeglio
Jan 30, 2020

The conceptual difference between text and audio that relates to how they are referenced is that a sync narration document goes with exactly one HTML file, as @HadrienGardeur pointed out above. So, repeating the filename in each sync point reference gets verbose. There is not a similar restriction on audio.

@marisademeglio Is there any specific reason for that?

Historically (from an EPUB perspective) I can understand that position, but in the case of audiobooks where audio files are the primary resource, it would also make sense that a single audio resource could reference multiple HTML resources.

Audiobooks are often produced with a single track for the whole publication, or with audio resources that cover multiple chapters.

Two reasons that come to mind -

We inherited EPUB's restriction on an HTML file being referred to by no more than one media overlay, which was intended to ease the burden on implementers (e.g. so you don't have to open every SMIL file to try and locate an element reference).
Sync narration is intended to be usable with generic HTML, not necessarily tied to an audiobooks publication (although realistically this will be the first real-world use case).

Could you do something like this to accomplish what you describe? (pardon the quick n dirty syntax):

readingOrder: [
   {
       url: "audio.mp3#t=0,120",
       alternate: "chapter1.json"
    },
    {
       url: "audio.mp3#t121,340",
       alternate: "chapter2.json"
    }
    ...
]

0 replies

danielweck · 2020-01-30T22:15:01Z

danielweck
Jan 30, 2020
Maintainer

1. The original text fragment syntax came from this draft, contributed by @danielweck :
   https://github.com/w3c/sync-media-pub/blob/267ef4b44ddb49789196755a08f71ba87ed88751/web-proposal.md#the-sync-media-json-format

Thanks for unearthing this Marisa, it's helpful (I stand corrected, I indeed wrote this proposal at the time). Note: this Markdown document's new location is https://github.com/w3c/sync-media-pub/blob/master/drafts/web-proposal.md

So, the thought process behind this particular spec. "tweak" in my initial draft was to explore the processing model specifically for when a JSON resource is directly referenced (via linking, or embedding) from an HTML document (i.e. without the WebPubManifest level of indirection), in which case the location of the JSON document itself does not necessarily have to be used as its "base" URL/URI/IRI, as this could instead be inferred from the embedding context.

Around the same time, there were discussions in the Web Publications group about "base" in JSON / JSON-LD, notably regarding the impact of "opaque" origin and null base:
w3c/pub-manifest#12
w3c/json-ld-syntax#103
w3c/json-ld-syntax#23 (comment)
See how in JSON-LD, the context @base can be used:
https://json-ld.org/spec/latest/json-ld/#base-iri

...so, to wrap-up, I personally feel very uneasy about my initial draft (use short URL fragment syntax, and assume "base" URL is the associated HTML document), but I also feel uncomfortable about creating an ad-hoc JSON syntax that allows overriding the "base" of the JSON resource for specific properties (i.e. text, ...and maybe even audio). I can of course totally see the benefits from an authoring perspective (i.e. less repetition), so I am keeping an open mind.

I wonder about prior art? Web App Manifest immediately comes to mind, for example see the scope and start_url properties:
https://www.w3.org/TR/appmanifest/#scope-member
https://www.w3.org/TR/appmanifest/#start_url-member
("using manifest URL as the base URL")

0 replies

danielweck · 2020-01-30T23:29:04Z

danielweck
Jan 30, 2020
Maintainer

Side note: see old issue #88

0 replies

HadrienGardeur · 2020-01-31T09:56:18Z

HadrienGardeur
Jan 31, 2020
Maintainer Author

Could you do something like this to accomplish what you describe? (pardon the quick n dirty syntax):
readingOrder: [
  {
      url: "audio.mp3#t=0,120",
      alternate: "chapter1.json"
   },
   {
      url: "audio.mp3#t121,340",
      alternate: "chapter2.json"
   }
   ...
]

@marisademeglio no we can't do that since readingOrder must reference full resources and not a fragment.
That's different from the W3C take on this matter:

The URLs expressed in the reading order MAY include fragment identifiers, although profiles of this specification MAY restrict both their use as well as what schemes and features are supported. Fragment identifiers are to be interpreted as defined by their respective specifications (e.g., the start location to move the user to, or the range of content to render before moving to the next item in the reading order).

I personally find this potentially very confusing for authors and UAs. There's an example to illustrate that in the W3C Audiobooks spec:

{
	"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
	"conformsTo" : "https://www.w3.org/TR/audiobooks/",
	"url" : "https://publisher.example.org/janeeyre",
	"name" : "Jane Eyre",
	"readingOrder" : [{
		"type": "LinkedResource",
		"url" : "audio/part001.wav#0",
		"encodingFormat" : "audio/vnd-wav",
		"name" : "Chapter 1",
		"duration" : "PT457.931S"
	}, {
		"type" : "LinkedResource",
		"url" : "audio/part001.wav#457.932",
		"encodingFormat" : "audio/vnd-wav",
		"name" : "Chapter 2",
		"duration" : "PT234.245S"
	}]
}

audio/part001.wav#0 is the equivalent of audio/part001.wav and it means: from the start of the resource until the end of the resource.
audio/part001.wav#457.932 means from 457.932 seconds into the resource to the end of the resource.

This means that a UA would have to play the portion from 457.932 seconds to the end of the resource twice.

0 replies

danielweck · 2020-01-31T11:32:06Z

danielweck
Jan 31, 2020
Maintainer

I personally find this potentially very confusing

Indeed. That is in fact the typical TOC processing model (i.e. start playback at the given timestamp, and play until the end of the resource is reached).

https://w3c.github.io/audiobooks/#toc-mediafragments

For example:

https://github.com/w3c/wpub/blob/948442a71610abcc757513dc3313f6ed0e8fd22f/experiments/audiobook/toc-as-json.json#L32

https://github.com/w3c/wpub/blob/948442a71610abcc757513dc3313f6ed0e8fd22f/experiments/audiobook/toc.html#L7

Should this issue be raised in the W3C repository? https://github.com/w3c/wpub/issues

(same construct in the sync-media example https://w3c.github.io/audiobooks/#example-13-audiobook-with-synchronized-narration )

UPDATE: I filed an issue https://github.com/w3c/wpub/issues/464

0 replies

danielweck · 2020-01-31T11:33:41Z

danielweck
Jan 31, 2020
Maintainer

...also, shouldn't audio/part001.wav#0 be audio/part001.wav#t=0?
https://www.w3.org/TR/media-frags/#naming-time

UPDATE: I filed this issue https://github.com/w3c/wpub/issues/463

0 replies

danielweck · 2020-01-31T11:59:33Z

danielweck
Jan 31, 2020
Maintainer

UPDATE: I filed issues

0 replies

HadrienGardeur · 2020-01-31T13:26:04Z

HadrienGardeur
Jan 31, 2020
Maintainer Author

Thanks for opening these issues @danielweck.

Even with ranges, it would be very easy to author a W3C Audiobook incorrectly and repeat partially the content. This makes me even more confident in our decision not to go in that direction for RWPM.

0 replies

Reference W3C Synchronized Narration #109

HadrienGardeur Dec 23, 2019 Maintainer

Replies: 25 comments

danielweck Dec 23, 2019 Maintainer

llemeurfr Dec 23, 2019 Maintainer

qnga Jan 23, 2020

qnga Jan 24, 2020

HadrienGardeur Jan 25, 2020 Maintainer Author

llemeurfr Jan 25, 2020 Maintainer

llemeurfr Jan 25, 2020 Maintainer

llemeurfr Jan 25, 2020 Maintainer

danielweck Jan 25, 2020 Maintainer

danielweck Jan 25, 2020 Maintainer

llemeurfr Jan 25, 2020 Maintainer

HadrienGardeur Jan 25, 2020 Maintainer Author

qnga Jan 26, 2020

llemeurfr Jan 26, 2020 Maintainer

HadrienGardeur Jan 27, 2020 Maintainer Author

marisademeglio Jan 29, 2020

HadrienGardeur Jan 30, 2020 Maintainer Author

marisademeglio Jan 30, 2020

danielweck Jan 30, 2020 Maintainer

danielweck Jan 30, 2020 Maintainer

HadrienGardeur Jan 31, 2020 Maintainer Author

danielweck Jan 31, 2020 Maintainer

danielweck Jan 31, 2020 Maintainer

danielweck Jan 31, 2020 Maintainer

HadrienGardeur Jan 31, 2020 Maintainer Author

HadrienGardeur
Dec 23, 2019
Maintainer

danielweck
Dec 23, 2019
Maintainer

llemeurfr
Dec 23, 2019
Maintainer

qnga
Jan 23, 2020

qnga
Jan 24, 2020

HadrienGardeur
Jan 25, 2020
Maintainer Author

llemeurfr
Jan 25, 2020
Maintainer

llemeurfr
Jan 25, 2020
Maintainer

llemeurfr
Jan 25, 2020
Maintainer

danielweck
Jan 25, 2020
Maintainer

danielweck
Jan 25, 2020
Maintainer

llemeurfr
Jan 25, 2020
Maintainer

HadrienGardeur
Jan 25, 2020
Maintainer Author

qnga
Jan 26, 2020

llemeurfr
Jan 26, 2020
Maintainer

HadrienGardeur
Jan 27, 2020
Maintainer Author

marisademeglio
Jan 29, 2020

HadrienGardeur
Jan 30, 2020
Maintainer Author

marisademeglio
Jan 30, 2020

danielweck
Jan 30, 2020
Maintainer

danielweck
Jan 30, 2020
Maintainer

HadrienGardeur
Jan 31, 2020
Maintainer Author

danielweck
Jan 31, 2020
Maintainer

danielweck
Jan 31, 2020
Maintainer

danielweck
Jan 31, 2020
Maintainer

HadrienGardeur
Jan 31, 2020
Maintainer Author