-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default reading order fallbacks via TOC #46
Conversation
I'm against the idea of an HTML TOC being a fallback for the primary reading order. These are two completely separate concepts, and while I feel that the primary reading order may be a fallback for a TOC, it's not true the other way around without additional requirements from the TOC. |
Adding additional requirement on a TOC for a fallback is perfectly fine, @HadrienGardeur. Let us formulate those and getting them documented. The goal remain to allow for very simple WP-s to be create-able easily, while letting the manifest control the complex cases. |
@iherman but these additional requirements on a TOC only make sense when the TOC is a fallback. I'm not a big fan of that, it's better to have must/should/may for the TOC that are unrelated to its fallback status. |
And these fallbacks seem to get more brittle as we try to construct ever more complex requirements. It's garbage in, garbage out processing. I'm beginning to think these don't belong with the infoset but are guidance purely for a user agent to construct for an invalid publication. Legitimizing this for lazy authoring is a bad idea. Anyway, some offered changes and thoughts. As Hadrien has pointed out before, I believe, the same primary resource can occur multiple times legitimately, so you only want to ignore consecutive references to the same resource.
The table of contents generation sounds like a bit of a fishing expedition in the hopes that it is there but the user forgot to link to it. It seems unlikely the UA will find a nice doc-toc marked nav element, though. (If only the first one found is used, then you can skip saying to disregard other ones.) In terms of a fallback that is reasonably assured to be there, though, I thought the current thinking was to take the titles of the primary resources in the reading order? You'll at least get some kind of table of contents for purely image-based works, since you can fall back to the url or file name. |
And just to be clear, what I mean by lazy authoring is not that there may be legitimate cases where this information could be reliably constructed. Rather, once we start assuming things will fall nicely into place because of certain scenarios we can envisage we end up with a very permissive specification that enables bad experiences for users because authors are not aware of why the affordances are there. We should have very tight rules and call this implicit information gathering (e.g., you don't have to list the default reading order, but you must set a "fromToc" flag) and/or push the processing into failure handling. |
It does sound like there is general agreement that the HTML-supplied TOC could be a fallback for the case where the manifest doesn't include a specific TOC. So, maybe this PR should be split in half, getting that part in? I waffle on HTML TOC as fallback for manifest-resident primary reading order. Two competing arguments: -- Plus: makes authoring simple WPs easy, perhaps obviating the need for a JSON manifest for very simple WPs. This morning I was more on this plus side, now I'm more on the minus side. |
I think @mattgarrish is making a good point that some of these fallbacks should be candidates for failure handling rather than the normal way we process a manifest. Right now we have a very reasonable list of requirements for our infoset and I'm not comfortable with fallbacks that can barely handle them, especially for the primary reading order. The lack of a list of primary resources in a manifest also raises the following question: if I discover the first chapter of a publication, and through it the manifest, how do I locate the TOC if the manifest does not contain any primary resource? Do I need to rely on well-known location or the presence of a specific link in the content to figure out where the TOC lives? This doesn't sound like "easy authoring" to me, and I'm not even talking about all the issues around processing a TOC to extract a reading order from it. |
… toc as fallback issue
Admin: I agree with the #46 (comment) of @GarthConboy : it was a mistake to lump two issues into one PR. So I split it:
|
@GarthConboy summarized it as:
I would not dismiss the second point. Instead of being very general, I take a document that, in my view, must be considered as a candidate for a WP, namely the HTML51 spec[1]. Although there is a single file version of the spec, the basic one is cut into a moderately large number of HTML files (35, to be precise). It also has a fairly detailed and long TOC which is, actually, repeated in all HTML files. The fact that it is repeated over all HTML files is a matter of the current Rec styling; a previous version of HTML[2] had its full TOC in a single file, namely Overview.html[3]. My goal is: I would like to avoid that producer of the HTML spec would have to maintain a separate reading order in a manifest file. Or are we saying that, no matter what, authors must produce this? I sense significant push-back coming from the editors of a document like HTML. Looking at the TOC right now[4] the rough algorithm described in the draft works fairly well: extract the URL-s in order from the TOC, and remove duplicates. What you get is the default reading order… |
@iherman this is only one example, not all table of contents look the same. Since we haven't defined at all what a TOC must/should/may contain, this feels extremely premature to have a TOC as an implicit fallback. |
@iherman <https://github.com/iherman> this is only one example, not all table of contents look the same.
Since we haven't defined at all what a TOC must/should/may contain, this feels extremely premature to have a TOC as an implicit fallback.
If it is a matter of being premature: I am fine with that insofar as we can have a separate discussion on the structure of the TOC. I think the FPWD should, nevertheless, make clear that this is indeed an issue that will be addressed at some point (whether it is with the wording of the current PR or not is a detail). I still believe we should make efforts to make it easy for documents like the HTML spec to be a bona fide WP with very low extra costs…
|
It's not the only issue that I have with it, but the fact that we haven't even defined:
... is enough to at least postpone this discussion. If a manifest doesn't contain a list of primary resources or a title, this means that we could have a WP without a manifest, which brings additional issues (How do we establish WP-ness without a manifest? How do we locate a TOC without a manifest?) that must also be resolved before any of this. |
@HadrienGardeur as I said, I fine postponing the discussion but not closing it. Maybe I am naïve, but I do not see such major issues, because I am also fine restricting the usable versions of TOC to the least complex ones. Also: I would also be fine saying that a WP-ness requires the presence of a manifest. If its only role, in a specific case, is to ensure the declaration of being a WP, that is fine with me. What I want to avoid is that, in simple cases, the author would have to repeat the same information several times: I do not think that would catch on. But again: postponement if fine. |
I'm not sure that a link to an empty document is a great idea though. Same thing for an empty script element. |
@HadrienGardeur this has been mentioned vaguely a few times: "this is only one example, not all table of contents look the same." from #46 (comment) Would you be able to find examples (print, digital, or whatever) that would present something unique or (more importantly) preventative from deducing a reading order? The more concrete the examples, the more concrete the spec. |
I've seen such publications before but publishers like @laudrain might have an easier time than me providing such examples quickly. |
I'll also cc @llemeurfr and @JayPanoz in here, they might be able to contact content producers to obtain examples for:
It's not hard to find publication that skip primary resources, there are a lot of examples out there in EPUB. I know that @baldurbjarnason has provided examples for non-linear publications before, these might also be relevant here. |
Thanks, @TzviyaSiegman. 😃 Thanks everyone for the examples! Sorry I requested them here... Let's do move them (and add more!) to the ToC-Samples wiki page. ...we now return this issue to it's actually intended use. 😁 |
If the TOC is "non-linear", then we do need the reading order explicitly in the manifest. What this tells me, and I believe we can agree on that, that the manifest MUST have a slot for a default reading order, and we cannot rely exclusively on the TOC in one of the resources. This also means that the authors of, say, cookbooks or travelbooks as WP-s should be aware of that, and they MUST provide an explicit reading order in the manifest. However. The question is what is the percentage of such Web Publications among all Web Publications. Although we do not necessarily have empirical evidence, I believe that most of the WP-s would be much more "regular", whereby I mean that TOC entries follow the regular reading order, although they would refine that greatly, providing, e.g., links to individual sections within some of the resources. I would even go as far as saying that the vast majority of WP-s would be like that. Taking the vast majority into account, a fallback on those continues to make sense to me, I must say. After all, fallback means "use this if the authoritative mechanism is not provided", where the authoritative mechanism is to have the default order in the manifest. |
Thanks everyone for the examples! Sorry I requested them here... Let's do move them (and add more!) to the ToC-Samples wiki page <https://github.com/w3c/wpub/wiki/ToC-Samples>.
I am sure this collection will become extremely useful later, so if we have more, let us have them here indeed!
|
To repeat myself, I'm sure, but can't we make a distinction between authoring intent and fallback processing? For example: The reading order must be included in the manifest, which is either an ordered list of primary resources or a link to an html nav element that contains such a list of links (needing these to be clearly different links, of course). These are the accepted ways to provide a reading order, but a reading system could piece one together by searching for a toc nav, inspecting the list of resources for certain media types (all html documents), etc. |
I may have missed this before, but I actually like this approach. In other words, at least for reading order but also for the TOC (and maybe for other things) we do something more explicit like this. It is also more efficient I suppose. Ie,
And, for each case the fallback is not normatively defined but left it to the UA (using the text you have already here and there). The extra load on the author may then become minimal. To use the HTML5 Spec example, it would involve a small JSON file with 2-3 links. No big deal (as opposed to repeat the TOC, for example). My first reaction is: I think I like that. I am not sure about the language, though... |
The TOC could also be a secondary resource IMO, it doesn't have to be in the reading order of the publication. |
Primary resource is decoupled from reading order (at least for now). Primary is one that is not nested within another (top-level), so it would account for a toc outside the reading order. |
@mattgarrish well I missed the part when primary resources became decoupled from the reading order... I'm not a fan of talking about nesting, I would much rather have:
|
I'm still unhappy about that. Aside from TOCs that do not follow the reading order, reference fragments and repeat references to primary resources, there are other issues that are not addressed here:
There are too many situations where a TOC does not contain the reading order, I really don't think it can be trusted as a fallback. |
It was in the last PR. We had two problems with the definitions: one is that primary being reading order and reading order being primary is circular. The other, as we discussed in another thread, is that secondary is tied to being part of the rendering of a primary. If primary are only resources in the reading order, there's almost no point in having a distinction. There are just resources of which some are in the reading order. The purpose of the rest is indeterminate until encountered. I just don't like this as far as any instruction about what needs to be in the manifest v. what is optional. Eventually it has to come around to some interpretation of standalone resource v. helper resource. But this PR probably isn't the best place to discuss. |
We also need to remember to qualify this as being about WPs that consist of more than one primary resource. As Benjamin reminded us, scholarly journal articles will surely be WPs and most consist of a single primary document, for which reading order and TOC are meaningless. There are millions of those out there. . . . |
Such a publication would have a single resource in the list of primary resources: problem solved. |
Exactly. List of primary resources, always needed. Reading order or TOC, not always. |
@mattgarrish probably not the best place, but this is completely tied to our discussion here... If you follow my suggestion for primary vs secondary vs external, then secondary resources are useful for:
I really dislike the notion of separating primary resources from the reading order, I don't see the point aside from making things more complex than they should be. For secondary resources, treating them as "sub" or "nested" resources is fairly useless without knowing which resource reference them. We'd be better off just listing all resources that are not in the reading order. |
This fork of the initial issue topic should IMO be moved to #16 (non-linear resources - primary, secondary or something else), where is seems it belongs. |
I must say I am at loss, @HadrienGardeur, because I do not see the problem. If, for whatever reasons, the TOC in a resource is not appropriate for that purpose, then the author can (and should) decouple it from the reading order. In the new scheme (thanks to @mattgarrish) the author would use the manifest in its full beauty to define the reading order. The only extra feature provided is that if the TOC is fine, then a pointer to the TOC suffices. In other word, for it to be valid, the manifest MUST include an explicit default reading order. The only thing is that the value of that default reading order may be a link to a TOC rather than a list of resources. It is (it must be) a conscious decision of the author (no automatic fallback). |
@HadrienGardeur Yes, I can still live with where we ended up in #16. Nothing in the world is going to make someone list a major content resource omitted from the reading order, whatever we call it. That was the distinction I was trying to eke out is that some resources are still more important than others, even if not listed. |
If the author explicitly links to a toc, then they've made the decision it is okay. That's how I can live with that. As far as fallbacks, I don't think we should ever mandate an algorithm. Finding a toc nav in a primary resource and using it is just one option of what a UA might choose to do. Same with how title/language are worded. |
Fully agree with @mattgarrish. |
It does sounds like we're now on roughly the same, er, page here -- excellent. And, just to provide an explicit thumbs up to a comment from @HadrienGardeur above:
If an author explicitly references a |
Okay by me as long as we remember to say default reading order. |
Closing in order to re-open consolidated PR. See #51. |
This is my attempt to extract from a number of issues (#35, #36, #39) some aspects of TOC vs. reading orders that may represent a level of consensus. I was mostly inspired by the comment of @baldurbjarnason (#36 (comment)) which described that there may be a fallback both to a TOC if a default reading order is not present, and on default reading orders if a TOC is not present, although both are listed as separate information in the WP Infoset.
Obviously, the trigger for all this was also #35
Preview | Diff