-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup of infoset requirements and fallbacks for title, language, toc and default reading order #51
Conversation
@mattgarrish bravo:-) I have some purely editorial comments. I list them here, but is independent of my "review" comment...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little bit unsure of the usage of "und" for language. I am tempted to say that if no language tag is defined and the user agent does not use its own algorithm or extraction method, then the value is set to "und", ie, undefined. This seems to be the 'spirit' of the "und" in BCP47...
Thanks, Ivan, I'll see what I can do about your notes. Note that 1.3 is normative now after the last PR (informative is now applied to subsections instead of the introduction itself). I took a look at HTML and some other specs and that's the approach they've used for these preliminary mixes of informative and normative. I keep thinking about reshuffling, but we keep tweaking the lists so I've held off. As that's purely editorial, I'll do after we get this PR out of the way. Also, would it make also sense to clarify under resources that the primary resources are compiled from the default reading order, since that already is the list of primary resource? Or too early? |
This is what confused me in the original issue, as EPUB differentiates the xml:lang of the package document from the dc:language declaration(s) of the content, for example. A resource can only have one active language declaration for its text content, so what do you do in the case of a multilingual publication if this declaration is both the content and the infoset info? |
And one last comment... Why is the html link element brought into the canonical identifier section at all? Isn't that serialization-specific? I've changed to the following, but as I missed the calls where this was discussed feel free to point out if I'm missing the point:
|
@mattgarrish : on the normative vs. non-normative: if the overall section is normative by default, that is fine. I would not touch the primary resource issue in this PR. Let us se if it is accepted and come back to it. |
@mattgarrish : the link element is an HTML element; in other syntaxes it may not exist or, worse, can have a different meaning. Hence it is better, imho, to make it precise... |
@mattgarrish : my feeling is that the language tag is to be meant only for the manifest and the metadata. Ie, it does not have any effect on the individual resources. If we do it otherwise then, although in rare cases, the behaviour of the browser with the resource may be different than when the same resource is 'accessed' via the WP/manifest. The metadata is affected by the language tag only if it is in the manifest or in a file referred from the manifest. DC entries used as metadata in the individual resources should not; they are treated as if they were used by the User Agent, independently of the WP. At leat this is my current feeling... |
Yes, but wasn't it proposed that the canonical/self link be in a JSON expression, too? We say that there must be a canonical identifier, and that this must be part of the infoset, but we don't say how it's expressed or retrieved. If it can be used as an html link, where is it coming from and where does this html link belong? Is it the case that it must be expressed as a canonical link element/property/header so that it can be determined? |
Yes, I agree to an extent. The extent being that it has no bearing on any resources, including the manifest. That's the impression I got when I asked here: #29 (comment) This language is not used for processing/rendering, only to describe the publication. It's like the content-language header, where you're just specifying the intended audience. The language of the manifest file (or any primary resource) will often be the language of the publication, but falls apart with multilingual publications. I can't say I have a bilingual document by putting |
I do not recall about that.
I believe the whole remark about being able to use the identifier in such a link is a note, rather than part of the core text, actually. We do not talk about how the identifier is expressed or retrieved, just as we do not say anything about the way the TOC is expressed (in the manifest).. |
Hm. You made me reaalize that we have three different roles that we MUST somehow cover.
I guess we agreed that the language information item has no bearing on No. 3; handling that falls back on how the resources do that per the HTML/SVG/etc. specifications. Maybe we should have, actually, two different language information items for Nos. 1 and 2, with the following extra rules:
In the absence of a publication language, User Agenta MAY reuse the language information of the first primary resource. Administratively, maybe we should move this thread to issue #29, though, and merge/close the PR (unless there are other objections) |
Oh, okay, that makes a little more sense. I was trying to figure out where this normatively comes into play. I'll re-adjust. |
Sounds like a plan. I'll merge later tonight if nothing else comes up. We're not striving to be complete at this stage so there's still plenty of time for debate on everything we've done in this PR. |
Thanks for the work, @mattgarrish. Some remaining typos, to be treated in the next PR. On 2.1, sentence about the manifest, the last word is now missing. Was "Manifest" before. Ivan proposed ... in "a separate section". 3.3 Title, the title is now optional, but the note about issue 20 states it is required, which is a contradiction. 3.4 language: I'm surprised to find mention of "BCP47 or its successors". BCP47 is a version independent identifier, the current RFC being 5646. And we also find the contradiction between the optional language in the infoset and the required language in the note about issue 29. |
Thank @llemeurfr
Yes, I discovered that this morning while checking for more bad links. Respec should insert the section number/name.
Yes, good catch.
Yes, "or its successors" is definitely unnecessary. I'll have these updated shortly. |
That's what we have in Readium: https://github.com/readium/webpub-manifest Even if it's just a note, I don't think that we should recommend using "canonical" for what we call the canonical identifier. This is used quite differently on the Web, and the "identifier" proposal would be a better fit. |
Going back to the reading order:
|
Yes, I'm still kind of confused by this. It's not completely wrong, so long as the "canonical" identifier is the URL of the manifest or the resource it is included in, but the note isn't saying that clearly. |
Therefore what is the difference btw this canonical id and the address of the Web Publication? |
I am not sure about this at all; the only problem I see with the Note is that it restricts the usage of the "canonical" link to the case when the identifier is a URL. The definition of an identifier says that if it is not a URL per se, it must be possible to make a one-to-one mapping to an address, and, I would think, that address should also be acceptable to be used in a We do not say whether or not the address would map onto the manifest; as far as I can see this is still an open issue, related to #5 (except that the comments in #5 went all over the place). |
I thought that the canonical identifier was meant for other identifiers, such as DOIs or ISBNs for example. |
Exactly. The identifier is indeed the DOIs and friends, and the URL representations thereof (when necessary) is some sort of a canonical "address". That is what should go into the The definition of the identifier also says that the ID must provide a way to get to the manifest. This is not the same as saying it is the address of the manifest. |
This is true for DOIs, not so much for ISBNs which are expressed as URNs. You definitely don't want to use |
This is where I'm lost. A canonical link provides the preferred address for a resource. It can overlap with a canonical identifier, but does it always? |
@iherman <https://github.com/iherman>
This is true for DOIs, not so much for ISBNs which are expressed as URNs. You definitely don't want to use ***@***.***="canonical" with a URN.
True. In which case the spec is silent about the canonical link (which is only a note anyway, not a normative statement).
|
Question. What's the time/space/tech continuum for "canonical identifier" (re: this PR)? It's currently stated as:
Given the following publication, what would it's "canonical identifier" be? |
The W3C considers What this tells me is that, in my view, is that how identifiers are used by various communities are not to be defined by this Working Group. It goes way beyond our scope. The information set should provide the right slots to store and use identifiers based on the specification we give, but that is where we should stop, and let other organizations and/or communities establish their own rules. |
As a formality, can I ask that we stop using this closed PR to discuss issues. It's confusing to follow at this point. Please open new issues for any clarifications/changes you think are necessary. |
Looks like the changes got merged and the old PR re-issues in my last attempt. I've reverted and this one's showing the right changes. Apologies for the extra email.
This PR is a consolidation of the changes in PRs #46, #47 and #49.
Also includes a few additional reversions/changes to the terminology that arose, stemming from issue #16:
Please give this a good look over to make sure I didn't mistranslate the discussions.
Preview | Diff