cleanup of infoset requirements and fallbacks for title, language, toc and default reading order #51

mattgarrish · 2017-08-26T00:57:31Z

Looks like the changes got merged and the old PR re-issues in my last attempt. I've reverted and this one's showing the right changes. Apologies for the extra email.

This PR is a consolidation of the changes in PRs #46, #47 and #49.

Also includes a few additional reversions/changes to the terminology that arose, stemming from issue #16:

reverting the definition of primary resource to a resource in the default reading order
changing secondary resource from required for a primary resource to required for the publication
moving default reading order definition inline into its section, otherwise it creates duplication/redundancy

Please give this a good look over to make sure I didn't mistranslate the discussions.

Preview | Diff

Revert "consolidates previous PRs #46, #47 and #49"

iherman · 2017-08-26T05:09:08Z

@mattgarrish bravo:-)

I have some purely editorial comments. I list them here, but is independent of my "review" comment...

"1.3 Terminology": this section should be explicitly set to normative (the enclosing section is set to be non-normative...)
First paragram in 1.3, sentence "In particular, for the following terms: user, user agent, browser, and address.": somehow the English does not sound right. Maybe simple make it part of the previous sentence?
"3.1 Overview", second para, second sentence: "It is primarily compiled from a Web Publication's manifest, whose serialization requirements are defined in Manifest." It reads funny with the word 'manifest' repeated. Maybe the second occurence should be spelled out as "in a separate section", or something similar (or the relevant section heading should be changed to allow for respec to do its work)
At some point it is probably better to reorder the subsections in section 3 to follow the list of required items of 3.2
"3.4 Language", end of first para: Maybe it is worth emphasizing that the language tag is also used as the default language tag for other information items or metadata where appropriate, like title, DC descriptions, etc. (Unless overwritten like, for example, if the title is extracted from a resource with its own language setting)
"3.5 Canonical identifier", second paragraph, the canonical can also be used, as far as I understand, for an HTTP response header; worth mentioning. Also, the link element is an HTML element; just to be precise we may want to add that.

iherman

I am a little bit unsure of the usage of "und" for language. I am tempted to say that if no language tag is defined and the user agent does not use its own algorithm or extraction method, then the value is set to "und", ie, undefined. This seems to be the 'spirit' of the "und" in BCP47...

mattgarrish · 2017-08-26T13:40:10Z

Thanks, Ivan, I'll see what I can do about your notes.

Note that 1.3 is normative now after the last PR (informative is now applied to subsections instead of the introduction itself). I took a look at HTML and some other specs and that's the approach they've used for these preliminary mixes of informative and normative.

I keep thinking about reshuffling, but we keep tweaking the lists so I've held off. As that's purely editorial, I'll do after we get this PR out of the way.

Also, would it make also sense to clarify under resources that the primary resources are compiled from the default reading order, since that already is the list of primary resource? Or too early?

mattgarrish · 2017-08-26T14:01:58Z

Maybe it is worth emphasizing that the language tag is also used as the default language tag for other information items or metadata where appropriate

This is what confused me in the original issue, as EPUB differentiates the xml:lang of the package document from the dc:language declaration(s) of the content, for example.

A resource can only have one active language declaration for its text content, so what do you do in the case of a multilingual publication if this declaration is both the content and the infoset info?

mattgarrish · 2017-08-26T14:25:04Z

And one last comment...

Why is the html link element brought into the canonical identifier section at all? Isn't that serialization-specific? I've changed to the following, but as I missed the calls where this was discussed feel free to point out if I'm missing the point:

If the canonical identifier is a URL, it may be used as the href value of a "canonical" link [rfc6596] for the Web Publication in its manifest or in the HTTP response header.

iherman · 2017-08-26T17:32:51Z

@mattgarrish : on the normative vs. non-normative: if the overall section is normative by default, that is fine.

I would not touch the primary resource issue in this PR. Let us se if it is accepted and come back to it.

iherman · 2017-08-26T17:34:30Z

@mattgarrish : the link element is an HTML element; in other syntaxes it may not exist or, worse, can have a different meaning. Hence it is better, imho, to make it precise...

iherman · 2017-08-26T17:40:16Z

@mattgarrish : my feeling is that the language tag is to be meant only for the manifest and the metadata. Ie, it does not have any effect on the individual resources. If we do it otherwise then, although in rare cases, the behaviour of the browser with the resource may be different than when the same resource is 'accessed' via the WP/manifest.

The metadata is affected by the language tag only if it is in the manifest or in a file referred from the manifest. DC entries used as metadata in the individual resources should not; they are treated as if they were used by the User Agent, independently of the WP.

At leat this is my current feeling...

mattgarrish · 2017-08-26T19:49:35Z

the link element is an HTML element

Yes, but wasn't it proposed that the canonical/self link be in a JSON expression, too?

We say that there must be a canonical identifier, and that this must be part of the infoset, but we don't say how it's expressed or retrieved.

If it can be used as an html link, where is it coming from and where does this html link belong?

Is it the case that it must be expressed as a canonical link element/property/header so that it can be determined?

mattgarrish · 2017-08-26T20:04:20Z

my feeling is that the language tag is to be meant only for the manifest and the metadata. Ie, it does not have any effect on the individual resources.

Yes, I agree to an extent. The extent being that it has no bearing on any resources, including the manifest.

That's the impression I got when I asked here: #29 (comment)

This language is not used for processing/rendering, only to describe the publication. It's like the content-language header, where you're just specifying the intended audience.

The language of the manifest file (or any primary resource) will often be the language of the publication, but falls apart with multilingual publications. I can't say I have a bilingual document by putting lang="en fr" in an HTML document, for example.

iherman · 2017-08-27T04:50:20Z

the link element is an HTML element

Yes, but wasn't it proposed that the canonical/self link be in a JSON expression, too?

I do not recall about that.

We say that there must be a canonical identifier, and that this must be part of the infoset, but we don't say how it's expressed or retrieved.

If it can be used as an html link, where is it coming from and where does this html link belong?

Is it the case that it must be expressed as a canonical link element/property/header so that it can be determined?

I believe the whole remark about being able to use the identifier in such a link is a note, rather than part of the core text, actually. We do not talk about how the identifier is expressed or retrieved, just as we do not say anything about the way the TOC is expressed (in the manifest)..

iherman · 2017-08-27T05:04:38Z

@mattgarrish,

my feeling is that the language tag is to be meant only for the manifest and the metadata. Ie, it does not have any effect on the individual resources.

Yes, I agree to an extent. The extent being that it has no bearing on any resources, including the manifest.

Hm. You made me reaalize that we have three different roles that we MUST somehow cover.

General information on the language(s) of the publication used, e.g., to install or access dictionaries and
Language of the manifest/metadata, ie, the language of textual information like the title (in the manifest) or Dublin Core or schema.org items in the attached metadata
Language(s) of the individual resources

I guess we agreed that the language information item has no bearing on No. 3; handling that falls back on how the resources do that per the HTML/SVG/etc. specifications. Maybe we should have, actually, two different language information items for Nos. 1 and 2, with the following extra rules:

if no publication language is explicitly stated, it has the single "und" value
if the manifest/metadata language is not stated separately, and there is a single publication language, that one is used; otherwise it has the "und" value

In the absence of a publication language, User Agenta MAY reuse the language information of the first primary resource.

Administratively, maybe we should move this thread to issue #29, though, and merge/close the PR (unless there are other objections)

mattgarrish · 2017-08-27T12:30:46Z

I believe the whole remark about being able to use the identifier in such a link is a note

Oh, okay, that makes a little more sense. I was trying to figure out where this normatively comes into play. I'll re-adjust.

mattgarrish · 2017-08-27T15:44:13Z

Administratively, maybe we should move this thread to issue #29, though, and merge/close the PR

Sounds like a plan.

I'll merge later tonight if nothing else comes up. We're not striving to be complete at this stage so there's still plenty of time for debate on everything we've done in this PR.

llemeurfr · 2017-08-28T08:37:04Z

Thanks for the work, @mattgarrish.

Some remaining typos, to be treated in the next PR.

On 2.1, sentence about the manifest, the last word is now missing. Was "Manifest" before. Ivan proposed ... in "a separate section".

3.3 Title, the title is now optional, but the note about issue 20 states it is required, which is a contradiction.

3.4 language: I'm surprised to find mention of "BCP47 or its successors". BCP47 is a version independent identifier, the current RFC being 5646. And we also find the contradiction between the optional language in the infoset and the required language in the note about issue 29.

mattgarrish · 2017-08-28T11:49:39Z

Thank @llemeurfr

On 2.1, sentence about the manifest, the last word is now missing. Was "Manifest" before. Ivan proposed ... in "a separate section".

Yes, I discovered that this morning while checking for more bad links. Respec should insert the section number/name.

3.3 Title, the title is now optional, but the note about issue 20 states it is required, which is a contradiction.

Yes, good catch.

3.4 language: I'm surprised to find mention of "BCP47 or its successors". BCP47 is a version independent identifier, the current RFC being 5646. And we also find the contradiction between the optional language in the infoset and the required language in the note about issue 29.

Yes, "or its successors" is definitely unnecessary.

I'll have these updated shortly.

HadrienGardeur · 2017-08-28T13:10:10Z

Yes, but wasn't it proposed that the canonical/self link be in a JSON expression, too?

That's what we have in Readium: https://github.com/readium/webpub-manifest

Even if it's just a note, I don't think that we should recommend using "canonical" for what we call the canonical identifier. This is used quite differently on the Web, and the "identifier" proposal would be a better fit.

HadrienGardeur · 2017-08-28T13:14:35Z

Going back to the reading order:

let's imagine a manifest with only a TOC specified in its list of primary resources
since it has something in the manifest, it's not entirely clear what the UA should do with the current spec language

mattgarrish · 2017-08-28T13:29:58Z

Even if it's just a note, I don't think that we should recommend using "canonical" for what we call the canonical identifier.

Yes, I'm still kind of confused by this. It's not completely wrong, so long as the "canonical" identifier is the URL of the manifest or the resource it is included in, but the note isn't saying that clearly.

llemeurfr · 2017-08-28T13:42:19Z

the "canonical" identifier is the URL of the manifest or the resource it is included in

Therefore what is the difference btw this canonical id and the address of the Web Publication?

iherman · 2017-08-28T13:46:58Z

Even if it's just a note, I don't think that we should recommend using "canonical" for what we call the canonical identifier.

Yes, I'm still kind of confused by this. It's not completely wrong, so long as the "canonical" identifier is the URL of the manifest or the resource it is included in, but the note isn't saying that clearly.

I am not sure about this at all; the only problem I see with the Note is that it restricts the usage of the "canonical" link to the case when the identifier is a URL. The definition of an identifier says that if it is not a URL per se, it must be possible to make a one-to-one mapping to an address, and, I would think, that address should also be acceptable to be used in a link element (or LINK header).

We do not say whether or not the address would map onto the manifest; as far as I can see this is still an open issue, related to #5 (except that the comments in #5 went all over the place).

HadrienGardeur · 2017-08-28T13:47:34Z

I thought that the canonical identifier was meant for other identifiers, such as DOIs or ISBNs for example.

iherman · 2017-08-28T13:50:25Z

@llemeurfr

the "canonical" identifier is the URL of the manifest or the resource it is included in

Therefore what is the difference btw this canonical id and the address of the Web Publication?

@HadrienGardeur:

I thought that the canonical identifier was meant for other identifiers, such as DOIs or ISBNs for example.

Exactly. The identifier is indeed the DOIs and friends, and the URL representations thereof (when necessary) is some sort of a canonical "address". That is what should go into the link element, imho, and that is different than the address which might change.

The definition of the identifier also says that the ID must provide a way to get to the manifest. This is not the same as saying it is the address of the manifest.

HadrienGardeur · 2017-08-28T13:53:26Z

@iherman

This is true for DOIs, not so much for ISBNs which are expressed as URNs. You definitely don't want to use link@rel="canonical" with a URN.

mattgarrish · 2017-08-28T13:55:37Z

This is true for DOIs, not so much for ISBNs which are expressed as URNs.

This is where I'm lost. A canonical link provides the preferred address for a resource. It can overlap with a canonical identifier, but does it always?

iherman · 2017-08-28T14:04:53Z

@iherman <https://github.com/iherman> This is true for DOIs, not so much for ISBNs which are expressed as URNs. You definitely don't want to use ***@***.***="canonical" with a URN.

True. In which case the spec is silent about the canonical link (which is only a note anyway, not a normative statement).

BigBlueHat · 2017-08-28T14:22:00Z

Question. What's the time/space/tech continuum for "canonical identifier" (re: this PR)?

It's currently stated as:

If assigned, this canonical identifier MUST be unique to the Web Publication .

Given the following publication, what would it's "canonical identifier" be?
https://www.w3.org/TR/html/

iherman · 2017-08-28T15:11:14Z

@BigBlueHat

Question. What's the time/space/tech continuum for "canonical identifier" (re: this PR)?

It's currently stated as:

If assigned, this canonical identifier MUST be unique to the Web Publication .

Given the following publication, what would it's "canonical identifier" be? https://www.w3.org/TR/html/

The W3C considers https://www.w3.org/TR/html/ as THE identifier for the HTML standard, and this approach seems to be fine with its constituent community. However, we have to recognize that other communities may not agree, because the W3C short name refers to the latest HTML standard; this is currently HTML5.2, but it may refer, one day, to HTML6 (if ever there is such thing). Policies on other identifiers may decide that such a major new version should receive a different identifier instead of sharing the same one.

What this tells me is that, in my view, is that how identifiers are used by various communities are not to be defined by this Working Group. It goes way beyond our scope. The information set should provide the right slots to store and use identifiers based on the specification we give, but that is where we should stop, and let other organizations and/or communities establish their own rules.

mattgarrish · 2017-08-28T15:17:10Z

As a formality, can I ask that we stop using this closed PR to discuss issues. It's confusing to follow at this point.

Please open new issues for any clarifications/changes you think are necessary.

mattgarrish and others added 2 commits August 25, 2017 20:50

Merge pull request #1 from w3c/master

a0916b9

Revert "consolidates previous PRs #46, #47 and #49"

consolidates previous PRs and modifies terminology

9af4b6b

mattgarrish requested review from BigBlueHat, HadrienGardeur, iherman, TzviyaSiegman, dauwhe, GarthConboy and llemeurfr August 26, 2017 00:57

iherman approved these changes Aug 26, 2017

View reviewed changes

This was referenced Aug 26, 2017

default reading order fallbacks via TOC #46

Closed

Retrieving a TOC from HTML files #47

Closed

rewording of title and language #49

Closed

iherman added the topic:manifest label Aug 26, 2017

iherman added this to the Abstract Manifest milestone Aug 26, 2017

updated to address Ivan's comments in PR #51

3cb9fe3

iherman mentioned this pull request Aug 27, 2017

For manifest in FPWD: Should Natural Language be Required per WCAG 2 #29

Closed

additional updates per discussions in PR #51

5dbb9c9

mattgarrish merged commit e4bea6c into w3c:master Aug 27, 2017

mattgarrish added a commit that referenced this pull request Aug 28, 2017

fix broken links, plus additional editorial changes noted in PR #51

cd6674f

mattgarrish mentioned this pull request Aug 28, 2017

separate implicit information from failure handling #48

Closed

This was referenced Aug 28, 2017

Minimum Viable Manifest #15

Closed

manifest: title #20

Closed

For manifest in FPWD: Should manifest TITLE be Required per WCAG 2? #30

Closed

Should the manifest be an implicit TOC? #26

Closed

Is the ToC sufficient to provide reading order? #36

Closed

This was referenced Aug 28, 2017

Picking a language #42

Closed

Language of web publication v. language of manifest/resources #53

Closed

BigBlueHat mentioned this pull request Aug 28, 2017

The canonical-ness of identification needs clarification #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleanup of infoset requirements and fallbacks for title, language, toc and default reading order #51

cleanup of infoset requirements and fallbacks for title, language, toc and default reading order #51

mattgarrish commented Aug 26, 2017 •

edited by pr-preview bot

Loading

iherman commented Aug 26, 2017

iherman left a comment

mattgarrish commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

iherman commented Aug 26, 2017

iherman commented Aug 26, 2017

iherman commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

iherman commented Aug 27, 2017

iherman commented Aug 27, 2017

mattgarrish commented Aug 27, 2017

mattgarrish commented Aug 27, 2017

llemeurfr commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

llemeurfr commented Aug 28, 2017

iherman commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

iherman commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

iherman commented Aug 28, 2017 via email

BigBlueHat commented Aug 28, 2017

iherman commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

cleanup of infoset requirements and fallbacks for title, language, toc and default reading order #51

cleanup of infoset requirements and fallbacks for title, language, toc and default reading order #51

Conversation

mattgarrish commented Aug 26, 2017 • edited by pr-preview bot Loading

iherman commented Aug 26, 2017

iherman left a comment

Choose a reason for hiding this comment

mattgarrish commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

iherman commented Aug 26, 2017

iherman commented Aug 26, 2017

iherman commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

mattgarrish commented Aug 26, 2017

iherman commented Aug 27, 2017

iherman commented Aug 27, 2017

mattgarrish commented Aug 27, 2017

mattgarrish commented Aug 27, 2017

llemeurfr commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

llemeurfr commented Aug 28, 2017

iherman commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

iherman commented Aug 28, 2017

HadrienGardeur commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

iherman commented Aug 28, 2017 via email

BigBlueHat commented Aug 28, 2017

iherman commented Aug 28, 2017

mattgarrish commented Aug 28, 2017

mattgarrish commented Aug 26, 2017 •

edited by pr-preview bot

Loading