Reference W3C Synchronized Narration #109
Replies: 25 comments
-
Given that Readium's "draft" proposal for its internal JSON representation of EPUB3 Media Overlays has not really been used in practice yet (i.e. just a parser, no playback engine): yes, I think it is timely to leverage the outcome of the W3C sync-media community group. |
Beta Was this translation helpful? Give feedback.
-
I agree that this work is good and it seems we don't need to make anything "better" here. I'll take a look at the details again before committing to this. |
Beta Was this translation helpful? Give feedback.
-
Personally, I think it is better to clearly identify media overlays, rather that using alternate as suggested by @HadrienGardeur and in Incorporating Synchronized Narration. Aren't alternate links more like fallback resources, as defined in Epub? |
Beta Was this translation helpful? Give feedback.
-
Pub manifest specification definitely prevents using alternate links for media overlays as it would break the algorithm for selecting alternate resource. See also (w3c/pub-manifest#133) |
Beta Was this translation helpful? Give feedback.
-
There would be a difference since they wouldn't use the same media type. They could also potentially use a different
They're not a fallback in the EPUB sense. A User Agent would compare the primary resource with its alternates and decide which one it should use. |
Beta Was this translation helpful? Give feedback.
-
I agree that a pub manifest "alternate" is broader than what EPUB fallbacks allow. It will be used to map EPUB fallbacks ("here is an Opus audio, and here is an alternate mp3 if the reading system cannot handle Opus") but it can also be used to select a sound quality ("here is a 320kbps audio, and here is an alternate 128kbps if bandwidth is an issue"). The alternate feature has been quite tricked to handle the sync-narration feature, with statements like "here is an HTML page, and here is its alternate sync-narration". The sync narration is more "complementary" or "additional" than "alternate" indeed. Which makes me wonder if replacing the current
|
Beta Was this translation helpful? Give feedback.
-
By the way, if we decide to keep a specific property, |
Beta Was this translation helpful? Give feedback.
-
Another aspect: reading the sample in sync narration spec and looking at sync narration in pub manifest, I see that the text property is "Value is a URL "fragment" which is typically a unique identifier that references a document element" => it is relative to the HTML page the sync narration is an alternative of. 1/ this is not how we should have an "alternate" resource specified. @danielweck, I'd like to discuss this with you an Marisa, maybe open an issue in the sync-media space |
Beta Was this translation helpful? Give feedback.
-
Indeed!
Thanks for bringing this to my attention, Laurent. The proposed processing model seems to rely on the base URL of the JSON resource being identical to that of the associated HTML document, for the purpose of resolving 'text' URLs. I think this is an incorrect approach (instead, 'text' URLs should be just like 'audio' URLs). If I remember correctly, I wasn't involved in the discussions that led to this design decision, but I can only blame myself for not participating more actively in the recent times. Marisa actually did the bulk of the re-work of the original draft specification. Credits to her (and many thanks) for editing/advancing the sync-media "specification" (Community Group), but I would indeed like to raise the issue about misuse of the JSON's base URL (in my opinion). |
Beta Was this translation helpful? Give feedback.
-
I posted an issue: w3c/sync-media-pub#28 |
Beta Was this translation helpful? Give feedback.
-
To be considered as a proper "alternate" rendition of the content, the json narration must be independent of the resource it is an alternate for. If the spec wording if modified to mandate valid URL strings, I think the situation will be good. |
Beta Was this translation helpful? Give feedback.
-
I don't agree with that statement. I think that this is consistent with other use cases, where alternate contains the primary resource that needs to be fetched to instantiate your navigator(s) properly. Let's imagine a publication where each resource in
This could also be a user preferences of course. I don't see much of a difference between fetching an HTML document as a primary resource and a Synchronized Narration document as a primary resource. When rendering HTML, you need to fetch secondary resources: images, CSS, JS, fonts, audio and video.
We created the |
Beta Was this translation helpful? Give feedback.
-
Ok, that's now clearer for me. With your approach, one can imagine a case where there are alternate audio links with various bitrates or codecs and multiple sync narr pointing each to a specific audio file. However, keeping the selection algorithm simple and preventing the necessity of inspecting each sync narr resource would require to cumulate in sync narr links properties of both text file and audio file, for example the bitrate and codec of the audio. |
Beta Was this translation helpful? Give feedback.
-
My comment was related to the way sync narrations are defined today by the W3C WG, i.e. with textual parts specified as URL fragment, dependent on the "main" HTML resource. Since this will certainly be corrected in the W3C spec and the textual part will become a URL string (absolute URL or relative to the sync narr json origin), I'm ok to consider sync narr as proper alternate renditions, and therefore remove the So, let's try a sample, with html pages and a sync narration based on the same html + audio, stored in an LPF file (-> relative URLs). In the manifest we would find: {
"href": "text/chapter1.html",
"type": "text/html",
"alternate": [
{ "href": "syncnarr1.json",
"type": "application/vnd.syncnarr+json",
"rel": "sync-narration"
}
]
} and syncnarr1.json containing: {
"role": "chapter1",
"narration": [
{
"text": "text/chapter1.html#id1",
"audio": "audio/voice1.mp3#t=0.0,1.2"
},
{
"text": "text/chapter1.html#id2",
"audio": "audio/voice1.mp3#t=1.2,3.4"
}
]
} With such a model, we can also have a slightly different html page in the sync narration, e.g. with a simplified structure, no footnotes, simpler tables ... anything that could make the narration smoother. |
Beta Was this translation helpful? Give feedback.
-
IMO the selection should be purely based on the Link Objects and shouldn't require fetching the Synchronized Narration document or related resources. Right now, this means that we could select between multiple Synchronized Narration documents based on language, but not based on audio format (this info isn't available in the document either). |
Beta Was this translation helpful? Give feedback.
-
I have just a few comments:
As well as a proposed solution: Nothing is set in stone yet. As implementers, your comments are most welcome.
|
Beta Was this translation helpful? Give feedback.
-
@marisademeglio Is there any specific reason for that? Historically (from an EPUB perspective) I can understand that position, but in the case of audiobooks where audio files are the primary resource, it would also make sense that a single audio resource could reference multiple HTML resources. Audiobooks are often produced with a single track for the whole publication, or with audio resources that cover multiple chapters. |
Beta Was this translation helpful? Give feedback.
-
Two reasons that come to mind -
Could you do something like this to accomplish what you describe? (pardon the quick n dirty syntax):
|
Beta Was this translation helpful? Give feedback.
-
Thanks for unearthing this Marisa, it's helpful (I stand corrected, I indeed wrote this proposal at the time). Note: this Markdown document's new location is https://github.com/w3c/sync-media-pub/blob/master/drafts/web-proposal.md So, the thought process behind this particular spec. "tweak" in my initial draft was to explore the processing model specifically for when a JSON resource is directly referenced (via linking, or embedding) from an HTML document (i.e. without the WebPubManifest level of indirection), in which case the location of the JSON document itself does not necessarily have to be used as its "base" URL/URI/IRI, as this could instead be inferred from the embedding context. Around the same time, there were discussions in the Web Publications group about "base" in JSON / JSON-LD, notably regarding the impact of "opaque" origin and ...so, to wrap-up, I personally feel very uneasy about my initial draft (use short URL fragment syntax, and assume "base" URL is the associated HTML document), but I also feel uncomfortable about creating an ad-hoc JSON syntax that allows overriding the "base" of the JSON resource for specific properties (i.e. I wonder about prior art? Web App Manifest immediately comes to mind, for example see the |
Beta Was this translation helpful? Give feedback.
-
Side note: see old issue #88 |
Beta Was this translation helpful? Give feedback.
-
@marisademeglio no we can't do that since
I personally find this potentially very confusing for authors and UAs. There's an example to illustrate that in the W3C Audiobooks spec: {
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"conformsTo" : "https://www.w3.org/TR/audiobooks/",
"url" : "https://publisher.example.org/janeeyre",
"name" : "Jane Eyre",
"readingOrder" : [{
"type": "LinkedResource",
"url" : "audio/part001.wav#0",
"encodingFormat" : "audio/vnd-wav",
"name" : "Chapter 1",
"duration" : "PT457.931S"
}, {
"type" : "LinkedResource",
"url" : "audio/part001.wav#457.932",
"encodingFormat" : "audio/vnd-wav",
"name" : "Chapter 2",
"duration" : "PT234.245S"
}]
}
This means that a UA would have to play the portion from 457.932 seconds to the end of the resource twice. |
Beta Was this translation helpful? Give feedback.
-
Indeed. That is in fact the typical TOC processing model (i.e. start playback at the given timestamp, and play until the end of the resource is reached). https://w3c.github.io/audiobooks/#toc-mediafragments For example: Should this issue be raised in the W3C repository? https://github.com/w3c/wpub/issues (same construct in the sync-media example https://w3c.github.io/audiobooks/#example-13-audiobook-with-synchronized-narration ) UPDATE: I filed an issue https://github.com/w3c/wpub/issues/464 |
Beta Was this translation helpful? Give feedback.
-
...also, shouldn't UPDATE: I filed this issue https://github.com/w3c/wpub/issues/463 |
Beta Was this translation helpful? Give feedback.
-
UPDATE: I filed issues |
Beta Was this translation helpful? Give feedback.
-
Thanks for opening these issues @danielweck. Even with ranges, it would be very easy to author a W3C Audiobook incorrectly and repeat partially the content. This makes me even more confident in our decision not to go in that direction for RWPM. |
Beta Was this translation helpful? Give feedback.
-
Instead of rolling out our own format for a media overlay equivalent, we should consider adopting the work being done within a CG at W3C around Synchronized Narration: https://w3c.github.io/sync-media-pub/synchronized-narration.html
A few notes regarding that document:
media-overlay
property anymore, we can simply usealternate
insteadAny thoughts on this? cc @danielweck @llemeurfr
Beta Was this translation helpful? Give feedback.
All reactions