Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing our manifest into WebIDL #268

Closed
HadrienGardeur opened this issue Jul 10, 2018 · 23 comments
Closed

Processing our manifest into WebIDL #268

HadrienGardeur opened this issue Jul 10, 2018 · 23 comments

Comments

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 10, 2018

There's a pretty large gap in our current draft: we don't describe at all how the manifest should be parsed and processed.

  • for some items (url, dateModified, datePublished and readingProgression) this is a straightforward conversion from JSON
  • for id/@id we need to define which term has the priority and what happens when both are present
  • for name, we need to check whether a string or an object (@value + @language) is used and parse it accordingly in LocalizableString
  • the default language also requires some additional processing since we need to extract it from the context definition
  • all creators require that we:
    • check if we have a single object/string or an array
    • parse both the string and object (with name) forms
    • handle name as well (string or @value + @language)
    • store the role somehow (TBD)
  • for all structural properties that use links (readingOrder, resources and links):
    • support both strings (URL) or objects
    • define a default encodingFormat when using a string? (TBD)
    • support both string and array of strings for rel
    • support both string and object (@name + @language) form for name and description
  • the table of contents require its own section since it's even more complex than the rest:
    • identify where the TOC is located by searching for contents in readingOrder and resources
    • fallback to the entry page if no reference is found
    • fetch and parse the HTML document
    • extract the TOC from HTML and populate the WebIDL
  • accessibility is TBD since we haven't defined WebIDL for it yet

We also need to decide how this will be organized in the spec. Do we express processing requirements along the manifest expression? Under the lifecycle section? In the appendix for the WebIDL?

@HadrienGardeur
Copy link
Author

Overall this will be fairly "heavy", but this is the price we have to pay for the flexibility when authoring JSON-LD + schema.org.

@iherman
Copy link
Member

iherman commented Jul 18, 2018

@HadrienGardeur yes, the IDL and, mainly, the processing has to be updated along the lines of what you describe.

I would, however, propose not to do it right now. I would think that the draft should be published by adding an editorial note into the document making it clear that the IDL + processing is out of date and this is one of the next steps to do. Once published, and giving the schema.org part some rest (and expect external comments) we can do the changes. @mattgarrish wdyt?

@iherman
Copy link
Member

iherman commented Jul 18, 2018

@HadrienGardeur on the technical aspect of your comments: I wonder whether, in the processing description (and, actually, the implementation), we should rely on the expansion API of the JSON-LD standard. In other words, the first step (after getting hold of the manifest) is to run the JSON-LD through the expansion and pick up after that. As the standard says, the expansion will make the context files disappear...

I am not saying it will solve all our issues, but:

  • all values will be stored as objects; if they were originally strings, they will be objects of the form { "@value" : "whatever" } with, possibly, the language tag added
  • every value will be an array, so we do not have to check that; if it is a list, it will have a "@list" property
  • overall, that form can be considered as some sort of a "canonical" form for processing, which may make the description (and, I guess, the implementation) of the manifest much clearer.

WDYT?

@iherman
Copy link
Member

iherman commented Jul 18, 2018

As an example, this is the expanded form of one of the examples in the document:

[
  {
    "@type": [
      "http://schema.org/CreativeWork"
    ],
    "http://schema.org/copyrightHolder": [
      {
        "@value": "World Wide Web Consortium"
      }
    ],
    "http://schema.org/copyrightYear": [
      {
        "@value": "2015"
      }
    ],
    "https://schema.org/creator": [
      {
        "@list": [
          {
            "@type": [
              "http://schema.org/Person"
            ],
            "http://schema.org/name": [
              {
                "@value": "Jeni Tennison"
              }
            ]
          },
          {
            "@type": [
              "http://schema.org/Person"
            ],
            "http://schema.org/name": [
              {
                "@value": "Gregg Kellogg"
              }
            ]
          },
          {
            "@type": [
              "http://schema.org/Person"
            ],
            "http://schema.org/name": [
              {
                "@value": "Ivan Herman"
              }
            ]
          }
        ]
      }
    ],
    "http://schema.org/datePublished": [
      {
        "@type": "http://schema.org/Date",
        "@value": "2015-12-17"
      }
    ],
    "@id": "http://www.w3.org/TR/tabular-data-model/",
    "https://www.w3.org/ns/wp#resources": [
      {
        "@value": "datatypes.html"
      },
      {
        "@value": "datatypes.svg"
      },
      {
        "@value": "datatypes.png"
      },
      {
        "@value": "diff.html"
      },
      {
        "@value": "test-utf8.csv"
      },
      {
        "@value": "test-utf8-bom.csv"
      },
      {
        "@value": "test-utf16.csv"
      },
      {
        "@value": "test-utf16-bom.csv"
      },
      {
        "@value": "test.xls"
      },
      {
        "@value": "test.xlsx"
      }
    ],
    "http://schema.org/url": [
      {
        "@id": "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/"
      }
    ]
  }
]

@HadrienGardeur
Copy link
Author

@iherman purely from a specification perspective, this would make things easier when writing our section about processing (except for url, dateModified, datePublished and readingProgression which wouldn't work as-is anymore).

From a real implementation point of view though, I doubt that most of our UAs will be JSON-LD aware at all, which might actually make things even more complex for them to implement.

@iherman
Copy link
Member

iherman commented Jul 18, 2018

@HadrienGardeur I am not sure. UA-s may ignore JSON-LD, in some sense, just use (available) JS libraries to perform expansion. I would certainly not implement expansion myself.

@HadrienGardeur
Copy link
Author

@iherman I'm worried about the overhead. It's not ideal to implement the processing manually, but there might be a real overhead (and/or a lack of libraries available in a given language) with the JSON-LD expansion API.

@iherman
Copy link
Member

iherman commented Jul 18, 2018

I understand... let us see what other implementers may say!

@llemeurfr
Copy link
Contributor

As a developer, looking at parsing a JSON structure to create an in-memory model, I would only expand the JSON-LD serialization if to do that I can find a great/well-maintained OSS lib for my favorite language (and here we could list more than 20 languages). It would be great but I doubt it will be the case. And after that, I would still have to map this structure to my in-memory model.

Currently, we parse "by hand" the structure, as the simple JSON decoding feature or our favorite languages don't work with the polymorphic structure of the Webpub manifest. It's boring but direct. And the native JSON decoding feature of my language would not make a clean object from the expanded JSON-LD structure neither, so this unmarshalling would still be boring.

@iherman
Copy link
Member

iherman commented Jul 18, 2018

Yeah... I have to accept all these arguments, it is still a pity, though. Relying on the compaction would make the manifest more future proof, too. For example, as of today, we cannot use the handy language map feature, because (afaik) no schema.org processor recognizes it. However, if they do tomorrow, we should then explicitly expand the processing of the manifest to include language maps to make it (standard wise) usable, whereas the expansion algorithm would handle that feature automatically, too.

Anyway... it sounded like a good idea:-)

@iherman
Copy link
Member

iherman commented Aug 14, 2018

I believe that the specific questions raised in this have been covered by the latest draft, the updates on the JSON-Schema, the conversion p.o.c implementation and some explicit issues. @HadrienGardeur @llemeurfr would it be o.k. to close this issue and, if I missed any open problem, raise it as a specific issue?

(Knowing that Section 5 of the draft on the Lifecyle still needs update of course.)

@llemeurfr
Copy link
Contributor

llemeurfr commented Aug 14, 2018 via email

@HadrienGardeur
Copy link
Author

I don't think that this issue has been truly covered.

We need to take a decision between:

  • simply documenting a WebIDL and having a few examples how it can be used (such as your p.o.c. @iherman)
  • or if we specify exactly how the manifest should be processed into our WebIDL

@iherman
Copy link
Member

iherman commented Aug 14, 2018

@HadrienGardeur (admin) I would prefer to push this into a separate issue, it is hard to handle an issue that has, in fact, several topics.

Could you open this separately and then close the present issue?

@HadrienGardeur
Copy link
Author

... but this is exactly what this issue is all about, why should we open the same one elsewhere?

@iherman
Copy link
Member

iherman commented Aug 14, 2018

As you said: there is the high level question (
#268 (comment)) which is really at the bases. But if you prefer to keep this one issue open, that is fine with me...

@iherman
Copy link
Member

iherman commented Aug 14, 2018

As for the question in #268 (comment): in my view, a refresh of the lifecycle session is the best option. The details of the algorithms may not be 100% detailed, ie, it is not a code reengineering into English; we will have to find the right style.

I was considering giving a go at it based on the p.o.c. work I did, unless somebody beats me into it...

@mattgarrish
Copy link
Member

Processing JSON into WebIDL seems to be an open question others are facing, too: whatwg/infra#159 and w3c/manifest#611

@iherman
Copy link
Member

iherman commented Aug 14, 2018

Yep... it is indeed not that simple. I was a bit too fast:-(

The problem is how to describe things in a language independent manner. If it was a matter of describing things in, say, JavaScript, than the p.o.c. implementation works and takes care of many things behind the screens.

My current thoughts are (comments welcome...)

  1. We define a (conceptual!) pre-processing in terms of a JSON-to-JSON transformation. That describes things like transforming all values into arrays (when appropriate), strings into @value+@language structures (for now, with #299 pending), names into Persons for creators, etc.
  2. As part of the same pre-processing steps, we describe issues like
    • using the <title> element as a name in case the latter is not available
    • adding an inLanguage property if the manifest is in a <script> and the language is set in the HTML
    • some other default actions I may forget…
  3. Much like it is done in the current draft, but also in the WebApp Manifest by:
    • "Let manifest be the result of converting json to a WebPublicationManifest dictionary"
    • "set manifest["url"] be the result of converting manifest["url"] into an absolute URL using the base URL" (or something like that
    • etc.

How does that sound?

@HadrienGardeur
Copy link
Author

I think this sounds good: it's a canonical version of our manifest. We'll need to be careful though about how everything is named in our WebIDL (including @value and @language).

@iherman
Copy link
Member

iherman commented Aug 15, 2018

@HadrienGardeur @mattgarrish I have created #306 that contains now a section on a canonical manifest (thanks for the term, @HadrienGardeur, it is perfect!)

I would propose to follow the discussion on the PR, we can make use of the PR facilities of direct comment and automatically generated diffs (that is why I already created a PR, although much work is still to be done).

@TzviyaSiegman
Copy link
Contributor

@HadrienGardeur @mattgarrish @iherman OK to close?

@iherman
Copy link
Member

iherman commented Nov 8, 2018

The canonicalization is now inherent part of the draft. Closing...

@iherman iherman closed this as completed Nov 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants