Processing our manifest into WebIDL #268

HadrienGardeur · 2018-07-10T10:56:37Z

There's a pretty large gap in our current draft: we don't describe at all how the manifest should be parsed and processed.

for some items (url, dateModified, datePublished and readingProgression) this is a straightforward conversion from JSON
for id/@id we need to define which term has the priority and what happens when both are present
for name, we need to check whether a string or an object (@value + @language) is used and parse it accordingly in LocalizableString
the default language also requires some additional processing since we need to extract it from the context definition
all creators require that we:
- check if we have a single object/string or an array
- parse both the string and object (with name) forms
- handle name as well (string or @value + @language)
- store the role somehow (TBD)
for all structural properties that use links (readingOrder, resources and links):
- support both strings (URL) or objects
- define a default encodingFormat when using a string? (TBD)
- support both string and array of strings for rel
- support both string and object (@name + @language) form for name and description
the table of contents require its own section since it's even more complex than the rest:
- identify where the TOC is located by searching for contents in readingOrder and resources
- fallback to the entry page if no reference is found
- fetch and parse the HTML document
- extract the TOC from HTML and populate the WebIDL
accessibility is TBD since we haven't defined WebIDL for it yet

We also need to decide how this will be organized in the spec. Do we express processing requirements along the manifest expression? Under the lifecycle section? In the appendix for the WebIDL?

The text was updated successfully, but these errors were encountered:

HadrienGardeur · 2018-07-10T11:51:53Z

Overall this will be fairly "heavy", but this is the price we have to pay for the flexibility when authoring JSON-LD + schema.org.

iherman · 2018-07-18T09:11:33Z

@HadrienGardeur yes, the IDL and, mainly, the processing has to be updated along the lines of what you describe.

I would, however, propose not to do it right now. I would think that the draft should be published by adding an editorial note into the document making it clear that the IDL + processing is out of date and this is one of the next steps to do. Once published, and giving the schema.org part some rest (and expect external comments) we can do the changes. @mattgarrish wdyt?

iherman · 2018-07-18T09:20:41Z

@HadrienGardeur on the technical aspect of your comments: I wonder whether, in the processing description (and, actually, the implementation), we should rely on the expansion API of the JSON-LD standard. In other words, the first step (after getting hold of the manifest) is to run the JSON-LD through the expansion and pick up after that. As the standard says, the expansion will make the context files disappear...

I am not saying it will solve all our issues, but:

all values will be stored as objects; if they were originally strings, they will be objects of the form { "@value" : "whatever" } with, possibly, the language tag added
every value will be an array, so we do not have to check that; if it is a list, it will have a "@list" property
overall, that form can be considered as some sort of a "canonical" form for processing, which may make the description (and, I guess, the implementation) of the manifest much clearer.

WDYT?

iherman · 2018-07-18T09:24:04Z

As an example, this is the expanded form of one of the examples in the document:

[
  {
    "@type": [
      "http://schema.org/CreativeWork"
    ],
    "http://schema.org/copyrightHolder": [
      {
        "@value": "World Wide Web Consortium"
      }
    ],
    "http://schema.org/copyrightYear": [
      {
        "@value": "2015"
      }
    ],
    "https://schema.org/creator": [
      {
        "@list": [
          {
            "@type": [
              "http://schema.org/Person"
            ],
            "http://schema.org/name": [
              {
                "@value": "Jeni Tennison"
              }
            ]
          },
          {
            "@type": [
              "http://schema.org/Person"
            ],
            "http://schema.org/name": [
              {
                "@value": "Gregg Kellogg"
              }
            ]
          },
          {
            "@type": [
              "http://schema.org/Person"
            ],
            "http://schema.org/name": [
              {
                "@value": "Ivan Herman"
              }
            ]
          }
        ]
      }
    ],
    "http://schema.org/datePublished": [
      {
        "@type": "http://schema.org/Date",
        "@value": "2015-12-17"
      }
    ],
    "@id": "http://www.w3.org/TR/tabular-data-model/",
    "https://www.w3.org/ns/wp#resources": [
      {
        "@value": "datatypes.html"
      },
      {
        "@value": "datatypes.svg"
      },
      {
        "@value": "datatypes.png"
      },
      {
        "@value": "diff.html"
      },
      {
        "@value": "test-utf8.csv"
      },
      {
        "@value": "test-utf8-bom.csv"
      },
      {
        "@value": "test-utf16.csv"
      },
      {
        "@value": "test-utf16-bom.csv"
      },
      {
        "@value": "test.xls"
      },
      {
        "@value": "test.xlsx"
      }
    ],
    "http://schema.org/url": [
      {
        "@id": "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/"
      }
    ]
  }
]

HadrienGardeur · 2018-07-18T09:26:06Z

@iherman purely from a specification perspective, this would make things easier when writing our section about processing (except for url, dateModified, datePublished and readingProgression which wouldn't work as-is anymore).

From a real implementation point of view though, I doubt that most of our UAs will be JSON-LD aware at all, which might actually make things even more complex for them to implement.

iherman · 2018-07-18T09:31:02Z

@HadrienGardeur I am not sure. UA-s may ignore JSON-LD, in some sense, just use (available) JS libraries to perform expansion. I would certainly not implement expansion myself.

HadrienGardeur · 2018-07-18T09:32:59Z

@iherman I'm worried about the overhead. It's not ideal to implement the processing manually, but there might be a real overhead (and/or a lack of libraries available in a given language) with the JSON-LD expansion API.

iherman · 2018-07-18T09:36:08Z

I understand... let us see what other implementers may say!

llemeurfr · 2018-07-18T10:03:15Z

As a developer, looking at parsing a JSON structure to create an in-memory model, I would only expand the JSON-LD serialization if to do that I can find a great/well-maintained OSS lib for my favorite language (and here we could list more than 20 languages). It would be great but I doubt it will be the case. And after that, I would still have to map this structure to my in-memory model.

Currently, we parse "by hand" the structure, as the simple JSON decoding feature or our favorite languages don't work with the polymorphic structure of the Webpub manifest. It's boring but direct. And the native JSON decoding feature of my language would not make a clean object from the expanded JSON-LD structure neither, so this unmarshalling would still be boring.

iherman · 2018-07-18T10:15:39Z

Yeah... I have to accept all these arguments, it is still a pity, though. Relying on the compaction would make the manifest more future proof, too. For example, as of today, we cannot use the handy language map feature, because (afaik) no schema.org processor recognizes it. However, if they do tomorrow, we should then explicitly expand the processing of the manifest to include language maps to make it (standard wise) usable, whereas the expansion algorithm would handle that feature automatically, too.

Anyway... it sounded like a good idea:-)

iherman · 2018-08-14T05:25:17Z

I believe that the specific questions raised in this have been covered by the latest draft, the updates on the JSON-Schema, the conversion p.o.c implementation and some explicit issues. @HadrienGardeur @llemeurfr would it be o.k. to close this issue and, if I missed any open problem, raise it as a specific issue?

(Knowing that Section 5 of the draft on the Lifecyle still needs update of course.)

llemeurfr · 2018-08-14T06:54:21Z

Good for me

…

. @HadrienGardeur @llemeurfr would it be o.k. to close this issue and, if I missed any open problem, raise it as a specific issue?

HadrienGardeur · 2018-08-14T13:24:27Z

I don't think that this issue has been truly covered.

We need to take a decision between:

simply documenting a WebIDL and having a few examples how it can be used (such as your p.o.c. @iherman)
or if we specify exactly how the manifest should be processed into our WebIDL

iherman · 2018-08-14T13:31:20Z

@HadrienGardeur (admin) I would prefer to push this into a separate issue, it is hard to handle an issue that has, in fact, several topics.

Could you open this separately and then close the present issue?

HadrienGardeur · 2018-08-14T13:32:14Z

... but this is exactly what this issue is all about, why should we open the same one elsewhere?

iherman · 2018-08-14T13:59:23Z

As you said: there is the high level question (
#268 (comment)) which is really at the bases. But if you prefer to keep this one issue open, that is fine with me...

iherman · 2018-08-14T14:02:25Z

As for the question in #268 (comment): in my view, a refresh of the lifecycle session is the best option. The details of the algorithms may not be 100% detailed, ie, it is not a code reengineering into English; we will have to find the right style.

I was considering giving a go at it based on the p.o.c. work I did, unless somebody beats me into it...

mattgarrish · 2018-08-14T14:20:09Z

Processing JSON into WebIDL seems to be an open question others are facing, too: whatwg/infra#159 and w3c/manifest#611

iherman · 2018-08-14T15:36:52Z

Yep... it is indeed not that simple. I was a bit too fast:-(

The problem is how to describe things in a language independent manner. If it was a matter of describing things in, say, JavaScript, than the p.o.c. implementation works and takes care of many things behind the screens.

My current thoughts are (comments welcome...)

We define a (conceptual!) pre-processing in terms of a JSON-to-JSON transformation. That describes things like transforming all values into arrays (when appropriate), strings into @value+@language structures (for now, with #299 pending), names into Persons for creators, etc.
As part of the same pre-processing steps, we describe issues like
- using the <title> element as a name in case the latter is not available
- adding an inLanguage property if the manifest is in a <script> and the language is set in the HTML
- some other default actions I may forget…
Much like it is done in the current draft, but also in the WebApp Manifest by:
- "Let manifest be the result of converting json to a WebPublicationManifest dictionary"
- "set manifest["url"] be the result of converting manifest["url"] into an absolute URL using the base URL" (or something like that
- etc.

How does that sound?

HadrienGardeur · 2018-08-14T16:07:37Z

I think this sounds good: it's a canonical version of our manifest. We'll need to be careful though about how everything is named in our WebIDL (including @value and @language).

iherman · 2018-08-15T14:11:38Z

@HadrienGardeur @mattgarrish I have created #306 that contains now a section on a canonical manifest (thanks for the term, @HadrienGardeur, it is perfect!)

I would propose to follow the discussion on the PR, we can make use of the PR facilities of direct comment and automatically generated diffs (that is why I already created a PR, although much work is still to be done).

TzviyaSiegman · 2018-10-10T15:21:33Z

@HadrienGardeur @mattgarrish @iherman OK to close?

iherman · 2018-11-08T05:44:51Z

The canonicalization is now inherent part of the draft. Closing...

HadrienGardeur assigned HadrienGardeur, iherman and mattgarrish Jul 10, 2018

HadrienGardeur assigned llemeurfr Jul 10, 2018

iherman added the propose closing label Aug 14, 2018

iherman mentioned this issue Aug 15, 2018

Introduction of a canonical manifest #306

Merged

iherman closed this as completed Nov 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing our manifest into WebIDL #268

Processing our manifest into WebIDL #268

HadrienGardeur commented Jul 10, 2018 •

edited

Loading

HadrienGardeur commented Jul 10, 2018

iherman commented Jul 18, 2018

iherman commented Jul 18, 2018

iherman commented Jul 18, 2018

HadrienGardeur commented Jul 18, 2018

iherman commented Jul 18, 2018

HadrienGardeur commented Jul 18, 2018

iherman commented Jul 18, 2018

llemeurfr commented Jul 18, 2018

iherman commented Jul 18, 2018

iherman commented Aug 14, 2018

llemeurfr commented Aug 14, 2018 via email

HadrienGardeur commented Aug 14, 2018

iherman commented Aug 14, 2018

HadrienGardeur commented Aug 14, 2018

iherman commented Aug 14, 2018

iherman commented Aug 14, 2018 •

edited

Loading

mattgarrish commented Aug 14, 2018

iherman commented Aug 14, 2018

HadrienGardeur commented Aug 14, 2018

iherman commented Aug 15, 2018

TzviyaSiegman commented Oct 10, 2018

iherman commented Nov 8, 2018

Processing our manifest into WebIDL #268

Processing our manifest into WebIDL #268

Comments

HadrienGardeur commented Jul 10, 2018 • edited Loading

HadrienGardeur commented Jul 10, 2018

iherman commented Jul 18, 2018

iherman commented Jul 18, 2018

iherman commented Jul 18, 2018

HadrienGardeur commented Jul 18, 2018

iherman commented Jul 18, 2018

HadrienGardeur commented Jul 18, 2018

iherman commented Jul 18, 2018

llemeurfr commented Jul 18, 2018

iherman commented Jul 18, 2018

iherman commented Aug 14, 2018

llemeurfr commented Aug 14, 2018 via email

HadrienGardeur commented Aug 14, 2018

iherman commented Aug 14, 2018

HadrienGardeur commented Aug 14, 2018

iherman commented Aug 14, 2018

iherman commented Aug 14, 2018 • edited Loading

mattgarrish commented Aug 14, 2018

iherman commented Aug 14, 2018

HadrienGardeur commented Aug 14, 2018

iherman commented Aug 15, 2018

TzviyaSiegman commented Oct 10, 2018

iherman commented Nov 8, 2018

HadrienGardeur commented Jul 10, 2018 •

edited

Loading

iherman commented Aug 14, 2018 •

edited

Loading