Restructuring Output Formats #63

gregsdennis · 2021-10-07T04:46:06Z

gregsdennis
Oct 7, 2021
Maintainer

The below is a summary of json-schema-org/json-schema-spec#973.

The topic has been moved here so that we can utilize threads to help more targeted discussions.

The working branch is draft-next-output.

Hey everyone. I've been reading through all of the comments in this and linked issues, compiling a list of topics.

It seems that we have consensus on these things:

Change errors to nested/subschemas/inner/details (other suggestions?)
Output format should follow that of the draft specified by the root schema
The node for a boolean schema need only specify the location up to that boolean (no need to include, e.g., $false in the location)
valid is required for each node.
error is not required but is declared as the designated place to put error messaging. Tooling that consumes this output should expect an error message here.
- wording is implementation-specific (preferably configurable to allow for custom messages or internationalization; token replacement as a possible suggested mechanism?)
Add verbiage to clarify that each keyword only produces a single node (import from #1099)

These things still require decisions (but I've proposed some things):

Annotations need to be rendered differently than they are now. I expand on this in a section below.
Require absoluteKeywordLocation instead of keywordLocation
- Possibility: keywordLocation could be validationPath and optional (default off) to the end user; opens the door for absoluteKeywordLocation to change to schemaLocation (for terseness); implementations SHOULD support an option mechanism which determines the inclusion of validationPath.
Add keyword to support renaming ☝️ (and maybe dialect - import from #1065). Only makes sense when a keyword is present. For instance at the root or #/allOf/3 there is no keyword, so these would be absent.
validationPath and instanceLocation can be plain JSON Pointers and do not need to be URI-encoded
For passing validation, do we include failed nodes (and converse) for basic and detailed formats? My users have said no. (verbose should show everything.)
Change detailed and verbose to *-by-schema and add *-by-instance (or some other way to indicate both structure and quantity of info) to support instance-based structures (see below for examples)
Update ruleset for determining which nodes belong in each format. (I'm also rethinking my implementation which builds verbose during validation and eliminates nodes at the end.)
- basic is a collection of terminal nodes
- detailed-by-schema
  - condenses single-node paths to the terminal node
  - applicators discard nodes which do not match the local result
  - keywords can require specific behavior (e.g. not)
- detailed-by-instance I'll discuss later.
flag to be specified as "MUST" (is currently "SHOULD"); supporting at least one other format to be specified as "SHOULD".

And this isn't directly about output, but supports it:

Implementation should provide a base URI if none is declared in the schema. This ensures the output always has something to use in absoluteSchemaLocation/schemaLocation. This may be in the spec already, but I figured it's worth mentioning.

Annotations

Annotations need to be part of the node, not rendered as child validation nodes. I don't think there's any argument here.

I've been thinking around this for a couple hours now, and it seems we have three orthogonal attributes that we need to consider.

1. Does the keyword provide validation?

Not all keywords provide validation. For example, the spec says nothing about validation for title and description.

My implementation handles this by always passing validation for these keyword, which works to provide the correct pass/fail outcome, but it also means that they produce an output node. So I get something like this as a node:

{
  "valid": true,
  "keywordLocaation": "#/description",
  "schemaLocation": "https://example.com/mySchema#/description",
  "instanceLocation": "",
  "annotation": "description"
}

Question: should pure-annotation keywords produce an output node or simply be listed in parent nodes?

I can see arguments for both sides, and I think I have a pretty easy tweak to my implementation that would hide these, so I'm not invested either way.

2./3. Does the keyword produce and/or collect annotations?

These are actually two different ideas, but I think I often conflate them, so I'm going to consider them distinct here.

A keyword can produce annotations, and it can propagate annotations from its children. Most keywords do one or the other, but some do both.

Keywords like title and description only produce annotations.
Keywords like allOf which only propagate child annotations.
Keywords like properties both produce (which of the listed properties are present) and propagate annotations.

Regardless of how an implementation manages annotations internally, I believe it's worthwhile keeping these distinctions separate when reporting them to the user. It can be useful to know whether a keyword produced an annotation or is just passing a message.

I'd like to propose two more output node properties to cover each of these ideas:

Annotations generated by a keyword are listed as an annotation property. This is just the raw value of the annotation (e.g. a string for title or an array of strings for properties).
Annotations which are collected from child nodes are listed as a collectedAnnotations property. This is an object where the key is the schema location (i.e. the value that would be in schemaLocation) of the keyword that produced it, and the value is the annotation itself.

The downside to listing collected annotations is duplication of annotation values. But on the upside, you can:

trace how annotations are carried through the validation
see at any level what annotations are carried through

Examples for existing output formats

These examples take the spec scenario and show what the output would look like if we changed all of this, including my suggestions on the open discussion topics.

For reference, here's the spec scenario:

// schema
{
  "$id": "https://example.com/polygon",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$defs": {
    "point": {
      "type": "object",
      "properties": {
        "x": { "type": "number", "description": "x coordinate" },
        "y": { "type": "number", "description": "y coordinate" }
      },
      "additionalProperties": false,
      "required": [ "x", "y" ]
    }
  },
  "type": "array",
  "items": { "$ref": "#/$defs/point" },
  "minItems": 3
}

// instance
[
  {
    "x": 2.5,
    "y": 1.3
  },
  {
    "x": 1,
    "z": 6.7
  }
]

Flag (no change) 🎉

{ "valid": true }

Basic

Click to expand!

{
  "valid": false,
  "validationPath": "",
  "schemaLocation": "https://example.com/polygon#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "keyword": "required",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
      "validationPath": "/items/$ref/required",
      "schemaLocation": "https://example.com/polygon#/$defs/point/required",
      "instanceLocation": "/1",
      "error": "Required property 'y' not found."
    },
    {
      "valid": false,
      "keyword": "additionalProperties",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
      "validationPath": "/items/$ref/additionalProperties",
      "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
      "instanceLocation": "/1/z",
      "error": "Additional property 'z' found but was invalid."
    },
    {
      "valid": false,
      "keyword": "minItems",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
      "validationPath": "/minItems",
      "schemaLocation": "/minItems",
      "instanceLocation": "",
      "error": "Expected at least 3 items but found 2"
    }
  ]
}

(I never really liked having a top-level node here, but I don't know how else to collect the nodes. I do like consistently having an object at the root, as opposed to an array.)

Detailed (schema-based)

Click to expand!

{
  "valid": false,
  "validationPath": "",
  "schemaLocation": "https://example.com/polygon#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "keyword": "$ref",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/core",
      "validationPath": "/items/$ref",
      "schemaLocation": "https://example.com/polygon#/$defs/point",
      "instanceLocation": "/1",
      "nested": [
        {
          "valid": false,
          "keyword": "required",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
          "validationPath": "/items/$ref/required",
          "schemaLocation": "https://example.com/polygon#/$defs/point/required",
          "instanceLocation": "/1",
          "error": "Required property 'y' not found."
        },
        {
          "valid": false,
          "keyword": "additionalProperties",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items/$ref/additionalProperties",
          "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
          "instanceLocation": "/1/z",
          "error": "Additional property 'z' found but was invalid."
        }
      ]
    },
    {
      "valid": false,
      "keyword": "minItems",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
      "validationPath": "/minItems",
      "schemaLocation": "/minItems",
      "instanceLocation": "",
      "error": "Expected at least 3 items but found 2"
    }
  ]
}

Verbose (schema-structured)

Click to expand! (this one's pretty big)

{
  "result": {
    "valid": false,
    "validationPath": "",
    "schemaLocation": "https://example.com/polygon#",
    "instanceLocation": "",
    "nested": [
      {
        "valid": true,
        "keyword": "type",
        "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
        "validationPath": "/type",
        "schemaLocation": "https://example.com/polygon#/type",
        "instanceLocation": ""
      },
      {
        "valid": false,
        "keyword": "minItems",
        "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
        "validationPath": "/minItems",
        "schemaLocation": "https://example.com/polygon#/minItems",
        "instanceLocation": "",
        "error": "Value has fewer than 3 items"
      },
      {
        "valid": false,
        "keyword": "items",
        "dialect": "https://json-schema.org/draft/2020-12/vocab/applicators",
        "validationPath": "/items",
        "schemaLocation": "https://example.com/polygon#/items",
        "instanceLocation": "",
        "nested": [
          {
            "valid": true,
            "keyword": "items",
            "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
            "validationPath": "/items",
            "schemaLocation": "https://example.com/polygon#/items",
            "instanceLocation": "/0",
            "nested": [
              {
                "valid": true,
                "keyword": "$ref",
                "dialect": "https://json-schema.org/draft/2020-12/vocab/core",
                "validationPath": "/items/$ref",
                "schemaLocation": "https://example.com/polygon#/items/$ref",
                "instanceLocation": "/0",
                "nested": [
                  {
                    "valid": true,
                    "validationPath": "/items/$ref",
                    "schemaLocation": "https://example.com/polygon#/$defs/point",
                    "instanceLocation": "/0",
                    "nested": [
                      {
                        "valid": true,
                        "keyword": "type",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
                        "validationPath": "/items/$ref/type",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/type",
                        "instanceLocation": "/0"
                      },
                      {
                        "valid": true,
                        "keyword": "properties",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
                        "validationPath": "/items/$ref/properties",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/properties",
                        "instanceLocation": "/0",
                        "annotation": [
                          "x",
                          "y"
                        ],
                        "collectedAnnotations": {
                          "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate",
                          "https://example.com/polygon#/$defs/point/properties/y/description": "y coordinate"
                        },
                        "nested": [
                          {
                            "valid": true,
                            "validationPath": "/items/$ref/properties/x",
                            "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x",
                            "instanceLocation": "/0/x",
                            "nested": [
                              {
                                "valid": true,
                                "keyword": "type",
                                "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
                                "validationPath": "/items/$ref/properties/x/type",
                                "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type",
                                "instanceLocation": "/0/x"
                              },
                              {
                                "valid": true,
                                "keyword": "description",
                                "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data",
                                "validationPath": "/items/$ref/properties/x/description",
                                "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description",
                                "instanceLocation": "/0/x",
                                "annotation": "x coordinate"
                              }
                            ]
                          },
                          {
                            "valid": true,
                            "validationPath": "/items/$ref/properties/y",
                            "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y",
                            "instanceLocation": "/0/y",
                            "nested": [
                              {
                                "valid": true,
                                "keyword": "type",
                                "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
                                "validationPath": "/items/$ref/properties/y/type",
                                "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/type",
                                "instanceLocation": "/0/y"
                              },
                              {
                                "valid": true,
                                "keyword": "description",
                                "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data",
                                "validationPath": "/items/$ref/properties/y/description",
                                "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/description",
                                "instanceLocation": "/0/y",
                                "annotation": "y coordinate"
                              }
                            ]
                          }
                        ]
                      },
                      {
                        "valid": true,
                        "keyword": "additionalProperties",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
                        "validationPath": "/items/$ref/additionalProperties",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
                        "instanceLocation": "/0",
                        "annotation": [],
                        "collectedAnnotations": {
                          "https://example.com/polygon#/$defs/point/properties": [
                            "x",
                            "y"
                          ]
                        }
                      },
                      {
                        "valid": true,
                        "keyword": "required",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
                        "validationPath": "/items/$ref/required",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/required",
                        "instanceLocation": "/0",
                        "collectedAnnotations": {
                          "https://example.com/polygon#/$defs/point/properties": [
                            "x",
                            "y"
                          ],
                          "https://example.com/polygon#/$defs/point/additionalProperties": []
                        }
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "valid": false,
            "keyword": "items",
            "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
            "validationPath": "/items",
            "instanceLocation": "/1",
            "nested": [
              {
                "valid": false,
                "keyword": "$ref",
                "dialect": "https://json-schema.org/draft/2020-12/vocab/core",
                "validationPath": "/items/$ref",
                "schemaLocation": "https://example.com/polygon#/items/$ref",
                "instanceLocation": "/1",
                "nested": [
                  {
                    "valid": false,
                    "validationPath": "/items/$ref",
                    "schemaLocation": "https://example.com/polygon#/$defs/point",
                    "instanceLocation": "/1",
                    "nested": [
                      {
                        "valid": true,
                        "keyword": "type",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
                        "validationPath": "/items/$ref/type",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/type",
                        "instanceLocation": "/1"
                      },
                      {
                        "valid": true,
                        "keyword": "properties",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
                        "validationPath": "/items/$ref/properties",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/properties",
                        "instanceLocation": "/1",
                        "annotation": [
                          "x"
                        ],
                        "collectedAnnotations": {
                          "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate"
                        },
                        "nested": [
                          {
                            "valid": true,
                            "validationPath": "/items/$ref/properties/x",
                            "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x",
                            "instanceLocation": "/1/x",
                            "nested": [
                              {
                                "valid": true,
                                "keyword": "type",
                                "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
                                "validationPath": "/items/$ref/properties/x/type",
                                "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type",
                                "instanceLocation": "/1/x"
                              },
                              {
                                "valid": true,
                                "keyword": "description",
                                "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data",
                                "validationPath": "/items/$ref/properties/x/description",
                                "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description",
                                "instanceLocation": "/1/x",
                                "annotation": "x coordinate"
                              }
                            ]
                          }
                        ]
                      },
                      {
                        "valid": false,
                        "keyword": "additionalProperties",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
                        "validationPath": "/items/$ref/additionalProperties",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
                        "instanceLocation": "/1",
                        "nested": [
                          {
                            "valid": false,
                            "validationPath": "/items/$ref/additionalProperties/$false",
                            "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties/$false",
                            "instanceLocation": "/1/z",
                            "error": "All values fail against the false schema"
                          }
                        ]
                      },
                      {
                        "valid": false,
                        "keyword": "required",
                        "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
                        "validationPath": "/items/$ref/required",
                        "schemaLocation": "https://example.com/polygon#/$defs/point/required",
                        "instanceLocation": "/1",
                        "error": "Required properties [y] were not present"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

Note the inclusion of the annotations from the subschemas which passed validation and how they're no longer included once we move up through a subschema that failed validation.

Instance-based structure examples

I think I've been able to marry our two approaches. Considerations:

From the discussions around putting output together originally, it was apparent to me that the request was for an unfiltered, hierarchical structure that mimicked the instance.
I want to keep the "output unit" as consistent as possible between the structures.

I do need to add a results property to contain the validation results that pertain to each location, and nested as an object instead of an array made more sense here.

Verbose (instance-based)

Click to expand!

{
  "valid": false,
  "validationPath": "",
  "schemaLocation": "https://example.com/polygon#",
  "instanceLocation": "",
  "results": [
    {
      "valid": true,
      "keyword": "type",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
      "validationPath": "#/type",
      "schemaLocation": "https://example.com/polygon#/type"
    },
    {
      "valid": false,
      "keyword": "minItems",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
      "validationPath": "#/minItems",
      "error": "Value has fewer than 3 items"
    },
    {
      "valid": false,
      "keyword": "items",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/applicators",
      "validationPath": "#/items"
    }
  ],
  "nested": {
    "/0": {
      "valid": true,
      "results": [
        {
          "valid": true,
          "keyword": "items",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items",
          "schemaLocation": "https://example.com/polygon#/items",
        },
        {
          "valid": true,
          "keyword": "$ref",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/core",
          "validationPath": "/items/$ref",
          "schemaLocation": "https://example.com/polygon#/items/$ref",
        },
        {
          "valid": true,
          "keyword": "type",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
          "validationPath": "/items/$ref/type",
          "schemaLocation": "https://example.com/polygon#/$defs/point/type",
        },
        {
          "valid": true,
          "keyword": "properties",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items/$ref/properties",
          "schemaLocation": "https://example.com/polygon#/$defs/point/properties",
          "annotation": [
            "x",
            "y"
          ],
          "collectedAnnotations": {
            "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate",
            "https://example.com/polygon#/$defs/point/properties/y/description": "y coordinate"
          }
        },
        {
          "valid": true,
          "keyword": "additionalProperties",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items/$ref/additionalProperties",
          "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
          "annotation": [],
          "collectedAnnotations": {
            "https://example.com/polygon#/$defs/point/properties": [
              "x",
              "y"
            ]
          }
        },
        {
          "valid": true,
          "keyword": "required",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
          "validationPath": "/items/$ref/required",
          "schemaLocation": "https://example.com/polygon#/$defs/point/required",
          "collectedAnnotations": {
            "https://example.com/polygon#/$defs/point/properties": [
              "x",
              "y"
            ],
            "https://example.com/polygon#/$defs/point/additionalProperties": []
          }
        }
      ],
      "nested": {
        "/0/x": {
          "valid": true,
          "results": [
            {
              "valid": true,
              "validationPath": "/items/$ref/properties/x",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x",
            },
            {
              "valid": true,
              "keyword": "type",
              "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
              "validationPath": "/items/$ref/properties/x/type",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type",
            },
            {
              "valid": true,
              "keyword": "description",
              "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data",
              "validationPath": "/items/$ref/properties/x/description",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description",
              "annotation": "x coordinate"
            }
          ]
        },
        "/0/y": {
          "valid": true,
          "results": [
            {
              "valid": true,
              "validationPath": "/items/$ref/properties/y",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y",
            },
            {
              "valid": true,
              "keyword": "type",
              "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
              "validationPath": "/items/$ref/properties/y/type",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/type",
            },
            {
              "valid": true,
              "keyword": "description",
              "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data",
              "validationPath": "/items/$ref/properties/y/description",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/description",
              "annotation": "y coordinate"
            }
          ]
        }
      }
    },
    "/1": {
      "valid": false,
      "results": [
        {
          "valid": false,
          "keyword": "items",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items",
        },
        {
          "valid": false,
          "keyword": "$ref",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/core",
          "validationPath": "/items/$ref",
          "schemaLocation": "https://example.com/polygon#/items/$ref",
        },
        {
          "valid": false,
          "validationPath": "/items/$ref",
          "schemaLocation": "https://example.com/polygon#/$defs/point",
        },
        {
          "valid": true,
          "keyword": "type",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
          "validationPath": "/items/$ref/type",
          "schemaLocation": "https://example.com/polygon#/$defs/point/type",
        },
        {
          "valid": true,
          "keyword": "properties",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items/$ref/properties",
          "schemaLocation": "https://example.com/polygon#/$defs/point/properties",
          "annotation": [
            "x"
          ],
          "collectedAnnotations": {
            "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate"
          }
        },
        {
          "valid": false,
          "keyword": "additionalProperties",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items/$ref/additionalProperties",
          "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties"
        },
        {
          "valid": false,
          "keyword": "required",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
          "validationPath": "/items/$ref/required",
          "schemaLocation": "https://example.com/polygon#/$defs/point/required",
          "error": "Required properties [y] were not present"
        }
      ],
      "nested": {
        "/1/x": {
          "valid": true,
          "results": [
            {
              "valid": true,
              "validationPath": "/items/$ref/properties/x",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x",
            },
            {
              "valid": true,
              "keyword": "type",
              "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
              "validationPath": "/items/$ref/properties/x/type",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type",
            },
            {
              "valid": true,
              "keyword": "description",
              "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data",
              "validationPath": "/items/$ref/properties/x/description",
              "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description",
              "annotation": "x coordinate"
            }
          ]
        },
        "/1/z": {
          "valid": false,
          "results": [
            {
              "valid": false,
              "validationPath": "/items/$ref/additionalProperties/$false",
              "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties/$false",
              "error": "All values fail against the false schema"
            }
          ]
        }
      }
    }
  }
}

Detailed (instance-based)

Initially I was only considering the verbose format for instance-based, but in building that example, I realized how to pare down the fluff.

The rules I'm following for this are:

Nodes that don't match the parent's validation result are removed. (The thought is that if you have a failed validation, you want to see where it failed, and if you have a successful validation, you don't care where it failed.)
If an output unit's validationPath is contained within another unit's validationPath, it (the first one) can be removed.

This second rule removes pass-through nodes like the kind you'd get having just resolved a $ref and arriving at a new subschema. These nodes are just a summary of what's underneath and don't add much to the result.

Click to expand!

{
  "valid": false,
  "validationPath": "",
  "schemaLocation": "https://example.com/polygon#",
  "instanceLocation": "",
  "results": [
    {
      "valid": false,
      "keyword": "minItems",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
      "validationPath": "#/minItems",
      "error": "Value has fewer than 3 items"
    }
  ],
  "nested": {
    "/1": {
      "valid": false,
      "results": [
        {
          "valid": false,
          "keyword": "required",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
          "validationPath": "/items/$ref/required",
          "schemaLocation": "https://example.com/polygon#/$defs/point/required",
          "error": "Required properties [y] were not present"
        }
      ],
      "nested": {
        "/1/z": {
          "valid": false,
          "results": [
            {
              "valid": false,
              "keyword": "additionalProperties",
              "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
              "validationPath": "/items/$ref/additionalProperties",
              "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties/$false",
              "error": "All values fail against the false schema"
            }
          ]
        }
      }
    }
  }
}

jdesrosiers · 2021-10-08T17:53:10Z

jdesrosiers
Oct 8, 2021
Maintainer

I started writing up some comments on some of these issues and it occurred to me that with a bunch of different threads going on at once, nothing is going to get done. So, I propose that we start with the things that don't seem controversial, make sure we have consensus, get it written up, and then move on to the next most important thing. I'll add a thread with the first set of proposals and a consensus measuring mechanism. (Feel free to tell me you don't like this approach. I won't be offended)

1 reply

gregsdennis Oct 8, 2021
Maintainer Author

The branch I've started takes into consideration everything, just so we can see what it all looks like together. Some of these things don't really do well in the spec by themselves, but we'll see where this takes us.

jdesrosiers · 2021-10-08T18:05:59Z

jdesrosiers
Oct 8, 2021
Maintainer

For this first proposal, we're bundling several changes at once because they appear to be relatively uncontroversial. If changes to this proposal are necessary based on discussion, this comment will be edited with a change log included.

Change errors to nested

Output format should follow that of the draft specified by the root schema

The node for a boolean schema need only specify the location up to that boolean (no need to include, e.g., $false in the location)

valid is required for each node.

error is not required but is declared as the designated place to put error messaging. Tooling that consumes this output should expect an error message here.

wording is implementation-specific (preferably configurable to allow for custom messages or internationalization; token replacement as a possible suggested mechanism?)

Add verbiage to clarify that each keyword only produces a single node (import from Regarding uniqueItems error interpretation json-schema-spec#1099)

Change Log

10/12: Change proposed new name for errors from subschemas to nested.

Use the following legend to show your support for this proposal. Remember this isn't a vote, it's a general consensus measuring device.

👍 I'm in. Let's do it!
👎 There is at least one significant blocker for me. Some changes are necessary to have my support.
😕 I intend to express support or not, but I need more time to catch up, think about it, or ask questions. Please wait for me before declaring consensus.

6 replies

gregsdennis Oct 8, 2021
Maintainer Author

I think nested is more general. subschemas doesn't necessarily apply in the instance-based structure. I'd like them to be the same.

gregsdennis Oct 8, 2021
Maintainer Author

Output format should follow that of the draft specified by the root schema

I also added language that an implementation that supports the latest version of the spec is only required to support output from that version. It might be really hard to support multiple output formats.

karenetheridge Oct 9, 2021
Maintainer

All of these are straightforward except renaming "errors". I feel like we may need to try out a few alternatives and mull on them a bit. Naming is hard! :)

gregsdennis Oct 10, 2021
Maintainer Author

So some history (I put it in the opening comment of #973 but it's worth repeating here):

The errors/annotations property as the name for the nested results was me trying to be clever and efficient. In the end it's a bit confusing. The only conclusion I could draw was that nested results should have a consistently named property.

Given the current schema-based structure, I'd go with subschemas, but, as I mentioned, that doesn't make sense for the instance-based structure. We need something more generic.

jdesrosiers Oct 12, 2021
Maintainer

Maybe we defer deciding on the new name for errors until there is concensus on the instance-based structure? Or, just call it nested for now and rename it again later if we need to. Changing our minds is cheap as long as we settle on something before the draft is released.

karenetheridge · 2021-10-09T20:32:00Z

karenetheridge
Oct 9, 2021
Maintainer

Require absoluteKeywordLocation instead of keywordLocation

Possibility: keywordLocation could be validationPath and optional (default off) to the end user; opens the door for absoluteKeywordLocation to change to schemaLocation (for terseness); implementations SHOULD support an option mechanism which determines the inclusion of validationPath.

I would change absoluteKeywordLocation to absoluteSchemaLocation -- as we have seen that sometimes the keyword is not at the very end of the location. e.g. .../properties/<propertyname> or .../allOf/0, and sometimes there isn't even a keyword at all, as in boolean (sub)schemas.

keywordLocation likewise should switch keyword to schema, and location also isn't quite right as this isn't describing a real location in the schema (if a $ref or similar keyword is involved). path feels a little closer to the truth, as in "the path taken to get to this location".

Regarding which of these should be required, perhaps we can just specify that at least one or the other should be included, and both MUST be included if a $ref (or similar) keyword is involved, or alternatively "..if the fragment portion of the absolute URI is identical to the traversed path". Only including the absolute URI can impair readability if there are a lot of punctuation characters in the path -- remember that URIs undergo an extra set of encoding (URL escaping, e.g. { -> %7B) on top of the encoding that json pointers do (~ -> ~0 and / -> ~1).

We also shouldn't imply that only $ref and $dynamicRef can alter the path from being a direct json pointer into the schema -- third-party vocabularies could possibly define their own keywords that have similar properties, and I have just found, while implementing custom metaschemas with the $schema and $vocabulary keywords, that errors can be generated with the keywordLocation of /$schema/$vocabulary (because some parts of the $vocabulary definition can only be validated at runtime when the vocabulary is referenced via a $schema keyword - example https://github.com/karenetheridge/JSON-Schema-Modern/blob/v0.521/t/dialects.t#L1140-L1175).

validationPath and instanceLocation can be plain JSON Pointers and do not need to be URI-encoded

This has been the case since draft2019-09, although some of the examples there mistakenly added a # which implied that they were URI fragments. That error was fixed in draft2020-12 so I think this point is fairly uncontroversial now.

18 replies

gregsdennis Jun 23, 2022
Maintainer Author

results may better suite the basic format, but it's not great for the structural ones, and I'd like to have consistency.

I don't know. I'm open to alternatives. It seemed we landed on nested, so that's what's in the PR for now.

gregsdennis Jul 27, 2022
Maintainer Author

@jdesrosiers regarding schemaLocation (the absolute one), I ran into a small edge case when implementing the output structure changes (output unit per subschema) in my implementation.

My implementation subs in a default base URI for when the schema doesn't provide one. It's mostly to facilitate this property in the output. However, when the root subschema is either true or false (as I have in at least one of my tests), there is no base URI that can be valid here, and my implementation throws an exception (NullReferenceException) when trying to access this property to add it to the JSON serialization.

As such, I have to check this property for null in my JSON converter so that the property isn't serialized.

Are you okay with schemaLocation being omitted in this very-edge case? Do you think it's something that needs mentioning in the spec?

Essentially the output for a false (root) schema will always be

{
  "valid": false,
  "evaluationPath": "",
  "instanceLocation": "",
  "errors": {
    "": "All values are invalid against the false schema"
  }
}

If the false schema appears as a subschema, it will have schemaLocation.

handrews Jul 27, 2022

We could just use something like about:boolean-json-schema as the URI in this case (analogous to about:blank)

jdesrosiers Jul 28, 2022
Maintainer

@gregsdennis

when the root subschema is either true or false [...], there is no base URI that can be valid

Why not? Isn't that the point of the default base URI? The identifier for the boolean schema would just be the default base URI.

gregsdennis Jul 28, 2022
Maintainer Author

I see what you're saying. In the context of a single validation, I can apply the default base URI to false.

In my implementation, I have a static (global) false schema defined, so I can't realistically put a base URI on it, but I could use the default, which is configurable.

Yeah, I think this may just be particular to my implementation.

karenetheridge · 2021-10-09T20:43:59Z

karenetheridge
Oct 9, 2021
Maintainer

Implementation should provide a base URI if none is declared in the schema. This ensures the output always has something to use in absoluteSchemaLocation/schemaLocation. This may be in the spec already, but I figured it's worth mentioning.

I've always disliked this being a hard requirement, because some schemas are very small and used inlined in code, and therefore have no "location" to speak of. e.g. one might want to use a small json schema to validate the structure of nested data passed in as a function argument, so the base URI of '' really is the most sensible thing there.

The spec is clear enough -- the initial base URI is defined according to its context, and implementations should document anything they assume.

27 replies

handrews Jul 30, 2022

I think you answered your own question there, Henry. 😉

If occasions exist where you need a base URI, and you don't have one, in general it makes sense to error.

No, it make sense to use a consistent default, because that is encouraged by the relevant specifications and no one has given me a clear reason why it ever needs to be an error instead. Why make things not-work when it is trivial to make them work?

gregsdennis Jul 31, 2022
Maintainer Author

I suppose one place where this would have an immediate impact is the test suite and inclusion of output tests.

An output test would need example output to use as a comparison. That example would need to have something in schemaLocation, and it needs to be implementation agnostic. For example (from one of the suite tests):

{
  "valid": false,
  "evaluationPath": "",
  "schemaLocation": "https://json-everything/base#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "evaluationPath": "/properties/foo",
      "schemaLocation": "https://json-everything/base#/properties/foo",
      "instanceLocation": "/foo",
      "nested": [
        {
          "valid": true,
          "evaluationPath": "/properties/foo/not",
          "schemaLocation": "https://json-everything/base#/properties/foo/not",
          "instanceLocation": "/foo"
        }
      ]
    }
  ]
}

I use https://json-everything/base as my default, and applications can set it to something else if they need. For output tests, we'd need to have something, like https://schema.test/output, and implementations would need to be able match it.

The two options I see are:

Require that implementations have a configuration point that allows customization of the default base URI. Otherwise they can't run the output tests without some post-processing of the output.
Have an absolute URI $id for every output test. This improperly avoids cases where a base URI isn't defined by the schema.

I don't particularly like either, but I dislike option 1 less.

jdesrosiers Aug 1, 2022
Maintainer

@gregsdennis

The two options I see are:

The other option is the spec defining a required value you must use when using a default base URI which would be an about: or tag: type of URI. However, it just occurred to me that an about: or tag: wouldn't be able to resolve against a relative URI. I could be wrong, but I don't think either of these schemes support a path component. Example: "$id": "foo" with no retrieval URI can't resolve against about:whatever. So, this solution would only work if there is no identifier given at all.

@handrews

What are these cases where it makes sense to throw an error?

Sorry, I didn't read carefully enough before and didn't realize this was still an open question before piling on as if it was agreed upon. When users add a schema to my internal schema store, I expect there to be a URI that they will use to retrieve or reference the schema later. If they don't provide a URI somehow, it should be a error. Assuming a base URI of something like about:whatever would mean they would then have to reference the schema as about:whatever which is not how it should be used. This kind of URI is not intended to be retrieved. It's a placeholder meaning there is no URI. Therefore, it only makes sense to be used for the root schema.

It also occurs to me that values in the output format could be used by an application trying to extract information. The original value of keywords is not included in the output format, so if an application needs this information, it would need to retrieve that value using the "schemaLocation", but about:whatever is not retrievable, it's just a placeholder. There are ways around this problem, but it makes implementation more complicated than if we just knew that every schema had a retrievable identifier.

handrews Aug 1, 2022

an about: or tag: wouldn't be able to resolve against a relative URI. I could be wrong, but I don't think either of these schemes support a path component. Example: "$id": "foo" with no retrieval URI can't resolve against about:whatever. So, this solution would only work if there is no identifier given at all.

A URI-reference of "foo" resolved against a base URI that uses colon-separated rather than slash-seprated path delimiters (the path in "tag:json-schema.org,2002:base" has two components, "json-schema.org,2022" and "base") would just resolve to "foo". Per RFC 3986 §5.2.3, if the base path does not include a /, the base path is discarded and entirely replaced by the reference path.

In other words, "foo" would resolve to "foo" and behave exactly the same as if no resolution against base URI was done. In other words, it will behave exactly as it does now. This is why a URI scheme that uses a colon-delimited path component is preferable as a default base, because it has very simple and predictable behavior.

Assuming a base URI of something like about:whatever would mean they would then have to reference the schema as about:whatever which is not how it should be used.

While $id both sets a base URI and assigns an identifier to a schema resource, as would a base URI derived from the retrieval URI, the default base URI need not assign an identifier. It could simply be a base URI. Fragment-only URI-references can be resolved internally without needing the base URI to identify the resource, and URI-references that are not fragment-only resolve as noted above, which make the unidentified resource irrelevant (although somehow you do need to find "foo", which works fine if it's embedded I suppose).

A schema resource without an identifier is not an error, it simply cannot be referenced except via a fragment-only reference of "#" (or possibly a plain name fragment if $anchor is present).

I have still not heard a reason why we can't do this [EDIT: "this" meaning require a default base URI that we specified, such as tag:json-schema.org,2022:base)], just assertions that this or that solution won't work. Can we examine the underlying rationale rather than trying to poke specific holes in various solutions? Because this is a known problem with known solutions, and even if some detail of some option I propose doesn't work, overall RFC 3986 is comprehensive and stable. These things work.

jdesrosiers Aug 2, 2022
Maintainer

In other words, "foo" would resolve to "foo" and behave exactly the same as if no resolution against base URI was done.

That doesn't make sense. The URI resolution process should always produce a non-relative URI (or fragment-only). Even if it does result in a relative-reference, we still don't have something we can use as an identifier so we end up with an error case either way. However, I looked this up and the spec says that there is no resolution authoritative resolution mechanism for tags. Either way, we don't have a non-relative URI to use as an identifier. I looked into about: and urn: specs as well and although they don't explicitly discuss resolution like tag: does, I don't see how it an work because neither support a path component.

While $id both sets a base URI and assigns an identifier to a schema resource, as would a base URI derived from the retrieval URI, the default base URI need not assign an identifier.

I think we may be talking about slightly different things and it's my fault because I was using the wrong term. What we're really talking about is a default retrieval URI, which I agree is different than a default base URI.

If we're just talking about a default base URI, it would only apply if there were no $id or retrieval URI and therefore, there would be no resolution problem as described above. If there was an $id, the base URI would be $id resolved against the retrieval URI. If there were no retrieval URI and $id is relative we have an error case because we don't have a non-relative URI for identification. Therefore, I agree that we can set a default base URI and still allow for error to be thrown when we require an identifier like adding to a schema store.

However, wasn't the point of a default base URI to ensure that all schemas have an identifier? Wouldn't this result in the root schema not having an identifier? Is there another problem we're trying to solve here that I missed? I don't see what problem having a default base URI solves.

At this point, the only options I see that solve the problem as I understand it are to define a default retrieval URI at the spec level which needs to be compatible with http: resolution or allow implementations to fail when they can't determine an identifier for the schema.

gregsdennis · 2022-05-17T22:14:31Z

gregsdennis
May 17, 2022
Maintainer Author

Output format should follow that of the draft specified by the root schema

Thinking about this, supporting multiple output formats sounds like a real pain as an implementor. I'd like to consider this alternative:

supporting the output format at all is optional, but strongly recommended
which format is supported is up to the implementation, but the format associated with the latest validation version supported by the implementation is strongly recommended (e.g. if 2020-12 is the latest validation supported, then the 2020-12 format should be supported)

18 replies

handrews Jun 6, 2022

If that is the case, then I think it all belongs in Core.

It sounds like there will be no further debate on that topic, but I feel that the discussion of the differing use cases is relevant to possible format restructuring. Should that be a new thread on this discussion?

gregsdennis Jul 27, 2022
Maintainer Author

I think we should revisit splitting output into its own spec once we're done with changes. At that point, we can ask the question of whether it's big enough that it bloats core and warrants its own spec. As it stands, I think it's a toss-up.

jdesrosiers Jul 28, 2022
Maintainer

I don't think the primary argument for splitting it has anything to do with size (although that is a legitimate factor). The point is to decouple the output format from any specific version of JSON Schema. Assume I have a 2020-12 schema that references a 2019-09 schema and a draft-07 schema. 2019-09 and 2020-12 have slight differences in output format and draft-07 has no defined output format. Which format should my implementation use? Should it use the one for the root schema for everything? Should it switch between output formats when it switches schemas? What is expected for draft-07? Splitting into a separate spec removes all of these ambiguities and allows implementations to simply say, I support is version (or versions) of the output format and it's clear what should be implemented and what behavior is expected no matter what version JSON Schema the schema written in even if it's an older version that didn't define an output format. I'm very strongly in favor of this approach.

gregsdennis Aug 4, 2022
Maintainer Author

Splitting the output into its own spec also supports versioning the output. While it doesn't necessarily need to be versioned independently of the core spec, allowing it to do so is nice. Secondly, it permits implementations to update to the next core spec without requiring them to also update their output.

With json-schema-org/json-schema-spec#1249 (based on this thread), we're making a substantial change to the structure of the output. While we don't expect this to be a continual practice, such a change would be easier to handle if the output update was separate from updating schema behaviors.

jdesrosiers Aug 4, 2022
Maintainer

[The output format] doesn't necessarily need to be versioned independently of the core spec, allowing it to do so is nice.

I think allowing the output format to be versioned independently is more than just nice, it's essential. The point is to decouple the output format from any specific version of JSON Schema. Using the same version names for the output format would signal a relationship to a JSON Schema version that doesn't exist. Even if we don't intend for there to be a relationship, it would confuse the community.

TeCHiScy · 2022-05-30T08:29:05Z

TeCHiScy
May 30, 2022

as described in #Proposal
I propose to standardize and structuring the error, and output them, so users can format their own messages, like:

{
  "valid": false,
  "keyword": "minLength",
  "keywordLocaation": "#/description",
  "xxx": {
    "received": 1,
    "limit": 3
  },
  "schemaLocation": "https://example.com/mySchema#/description",
  "instanceLocation": "",
  "annotation": "description"
}

Here xxx should be a better name, the keys in xxx is related to the keyword, and should be well-defined, both in meaning and typing.

4 replies

gregsdennis May 30, 2022
Maintainer Author

I think what we might do is encourage implementors to support localization and customization of error messages. Then we can add the things like keyword and whatever your xxx ends up being to the output unit.

I'm not sure whether augmenting the schema with the messages is the right approach, but I'm happy to be convinced. Mainly, we don't have a precedent for keywords that exist solely to be used by the implementation. It's annotative but not something that would be returned to the app.

I suggest we start this requirement as a SHOULD and upgrade it to a MUST in later iterations.

karenetheridge May 31, 2022
Maintainer

If we were to do this, it would make sense to create a separate section at the top level of the schema for the messages, like $defs. But this is something I think a separate application can do (it doesn't need to be tied to the evaluation operation itself), as long as we add just a few more pieces of information in the error objects, as suggested above - "keyword" (although it can be determined by examining the keywordLocation, it's simpler to make it explicit), and keyword-specific arguments (e.g. "actualLength" for the minimumLength and maximumLength keywords) which can be used to fill in a custom error message with sprintf-style formatting.

gregsdennis May 31, 2022
Maintainer Author

it would make sense to create a separate section at the top level of the schema for the messages, like $defs

I wonder how this would work when $ref-ing into an external schema resource if both declare messages. I expect we'd need to only use the one at the validation root.

I'm on the fence about including them in the schema itself. I'd need a really good reason to add that.

I'm a fan of using {{token}}s or similar to identify replaceable values rather than just keywords, e.g. actualLength.

If the spec declares error messages, it should only do so as defaults. They should be overridable by the user.

Lastly, for localization, I think it's probably sufficient that

the spec declares messages in English only, with some wording about equivalent translations.
the spec encourages implementations to provide some mechanism to support it, but that mechanism can be implementation-defined.

TeCHiScy May 31, 2022

it would make sense to create a separate section at the top level of the schema for the messages

I'm a little doubt about putting messages at the top level of the schema. For a complexed schema (nested, has many keywords, etc.), this could weaken the relation between messages and validation keywords, which lead to some difficulty in reading and maintenance the schema.

gregsdennis · 2022-06-16T03:02:00Z

gregsdennis
Jun 16, 2022
Maintainer Author

Maybe I've been thinking about this wrong from the start.

What if we only have an output unit per schema/subschema instead of per keyword? The output unit would need to be adjusted to house errors & annotations from keywords.

The detailed output would go from

{
  "valid": false,
  "validationPath": "",
  "schemaLocation": "https://example.com/polygon#",
  "instanceLocation": "",
  "nested": [
    {
      "valid": false,
      "keyword": "$ref",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/core",
      "validationPath": "/items/$ref",
      "schemaLocation": "https://example.com/polygon#/$defs/point",
      "instanceLocation": "/1",
      "nested": [
        {
          "valid": false,
          "keyword": "required",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
          "validationPath": "/items/$ref/required",
          "schemaLocation": "https://example.com/polygon#/$defs/point/required",
          "instanceLocation": "/1",
          "error": "Required property 'y' not found."
        },
        {
          "valid": false,
          "keyword": "additionalProperties",
          "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
          "validationPath": "/items/$ref/additionalProperties",
          "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
          "instanceLocation": "/1/z",
          "error": "Additional property 'z' found but was invalid."
        }
      ]
    },
    {
      "valid": false,
      "keyword": "minItems",
      "dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
      "validationPath": "/minItems",
      "schemaLocation": "/minItems",
      "instanceLocation": "",
      "error": "Expected at least 3 items but found 2"
    }
  ]
}

to just

{
  "valid": false,
  "evaluationPath": "",
  "schemaLocation": "https://example.com/polygon#",
  "instanceLocation": "",
  "errors": {
    "minItems": "Expected at least 3 items but found 2"
  },
  "nested": [
    {
      "valid": false,
      "evaluationPath": "/items/$ref",
      "schemaLocation": "https://example.com/polygon#/$defs/point",
      "instanceLocation": "/1",
      "errors": {
        "required": "Required property 'y' not found.",
        "additionalProperties": "Additional property 'z' found but was invalid."
      }
    }
  ]
}

This seems a LOT simpler, and all the information you need is still there.

I'd still suggest that specific error wording is left to implementations.
I suppose if we wanted to include dialect, it would need to prefix the keyword name in the errors object.

For completeness, here's something of what a valid result would like with annotations:

{
  "valid": true,
  "evaluationPath": "",
  "schemaLocation": "https://example.com/polygon#",
  "instanceLocation": "",
  "annotations": {
    "items": true
  },
  "nested": [
    {
      "valid": true,
      "evaluationPath": "/items/$ref",
      "schemaLocation": "https://example.com/polygon#/$defs/point",
      "instanceLocation": "/1",
      "annotations": {
        "properties": [ "x", "y" ]
      }
    }
  ]
}

1 reply

handrews Jun 17, 2022

[NOTE: I'm going to put a brief comment here, but sometime in the next few days I will start a separate discussion about annotations and their uses that will go into a lot more context. For now, I just want to put this down as a marker for annotation-oriented requirements. I don't expect it to necessarily be convincing on its own.]

This has good readability for errors, but for me with annotations, nested structures are not ideal.

You have to make a choice regarding what structure you are following — instance, schema (lexical scope), or evaluation (dynamic scope) — and that choice will be right for some applications but wrong for others. This evaluation path-based structure is good for "how did I attach annotations", but much more complex to work with for "what annotations are attached to this instance location" or "what instance locations were annotated by this keyword." You can get all of that information, of course, but you have to walk a tree structure and process it into a different structure, and tree-walking is not the most straightforward way to process data.

Per section 7.7.1 of the spec:

A collected annotation MUST include the following information:

The name of the keyword that produces the annotation

The instance location to which it is attached, as a JSON Pointer

The schema location path, indicating how reference keywords such as "$ref" were followed to reach the absolute schema location.

The absolute schema location of the attaching keyword, as a URI. This MAY be omitted if it is the same as the schema location path from above.

The attached value(s)

The easiest thing for me is a flat list of these 5-tuples (whether the 5-tuples are literal lists/tuples or objects, either is fine).

I can sort the list by any of the fields to mimic any of the structures. I can filter the list by keyword. Given an instance or schema location, and can convert it to a JSON Pointer or URI and find it in the list.

Of course, a flat list of 5-tuples with the annotation value replaced by errors is not good for human readability, which is important for errors.

gregsdennis · 2022-06-17T01:43:11Z

gregsdennis
Jun 17, 2022
Maintainer Author

(Ugh... this was supposed to be part of the previous thread...)

For the above example, because the nesting is only one deep, the basic and detailed formats are the same. Here's one that shows a difference.

Schema

{
  "type": "object",
  "title": "root",
  "properties": {
    "foo": {
      "type": "object",
      "title": "foo-title",
      "properties": {
        "foo-prop": {
          "const": 1,
          "title": "foo-prop-title"
        }
      }
    },
    "bar": {
      "type": "object",
      "title": "bar-title",
      "properties": {
        "foo-prop": {
          "const": 2,
          "title": "bar-prop-title"
        }
      }
    }
  }
}

Instance

{
  "foo": {"foo-prop": 1},
  "bar": {"bar-prop": 2}
}

Detailed output

{
  "valid": true,
  "evaluationPath": "",
  "instanceLocation": "",
  "annotations": {
    "title": "root",
    "properties": ["foo", "bar"]
  },
  "nested": [
    {
      "valid": true,
      "evaluationPath": "/properties/foo",
      "instanceLocation": "/foo",
      "annotations": {
        "title": "foo-title",
        "properties": ["foo-prop"]
      },
      "nested": [
        {
          "valid": true,
          "evaluationPath": "/properties/foo/properties/foo-prop",
          "instanceLocation": "/foo/foo-prop",
          "annotations": {
            "title": "foo-prop-title"
          }
        }
      ]
    },
    {
      "valid": true,
      "evaluationPath": "/bar",
      "instanceLocation": "/bar/bar-prop",
      "annotations": {
        "title": "bar-title",
        "properties": ["bar-prop"]
      },
      "nested": [
        {
          "valid": true,
          "evaluationPath": "/properties/bar/properties/bar-prop",
          "instanceLocation": "/bar/bar-prop",
          "annotations": {
            "title": "bar-prop-title"
          }
        }
      ]
    }
  ]
}

Basic output (same, but flattened, wrapped into the nested of a single node, and location properties of the root are superfluous)

{
  "valid": true,
  "nested": [
    {
      "valid": true,
      "evaluationPath": "",
      "instanceLocation": "",
      "annotations": {
        "title": "root",
        "properties": ["foo", "bar"]
      }
    },
    {
      "valid": true,
      "evaluationPath": "/properties/foo",
      "instanceLocation": "/foo",
      "annotations": {
        "title": "foo-title",
        "properties": ["foo-prop"]
      }
    },
    {
      "valid": true,
      "evaluationPath": "/properties/foo/properties/foo-prop",
      "instanceLocation": "/foo/foo-prop",
      "annotations": {
        "title": "foo-prop-title"
      }
    },
    {
      "valid": true,
      "evaluationPath": "/bar",
      "instanceLocation": "/bar/bar-prop",
      "annotations": {
        "title": "bar-title",
        "properties": ["bar-prop"]
      }
    },
    {
      "valid": true,
      "evaluationPath": "/properties/bar/properties/bar-prop",
      "instanceLocation": "/bar/bar-prop",
      "annotations": {
        "title": "bar-prop-title"
      }
    }
  ]
}

You can see that with the basic output, you can still do all of your sorting as needed. The only difference between this and the current output is that we don't have output units for keywords; the info is consolidated at the subschema level. I think this makes more sense.

It also aligns with the idea that, upon validation failure, annotations are only discarded at the subschema boundary rather than by individual keywords, which is another discussion we've had recently (also prompted by @handrews).

9 replies

gregsdennis Jun 27, 2022
Maintainer Author

It seems I missed the various properties annotations. 😅

gregsdennis Jun 27, 2022
Maintainer Author

In writing this up and including passing examples with annotations, I'm discovering that it's probably going to be pretty rare that nodes could be collapsed to a condensed hierarchical format.

I'm thinking of dropping the detailed format and just having the flat list (basic) and verbose (hierarchical). I also think this will reduce cognitive load on implementors to try and get the reduction logic right.

handrews Jun 27, 2022

@gregsdennis that sounds like a good idea. I admit it took me a bit to really "get" the distinction between detailed and verbose. The different descriptions made sense, but I had to kind of work through it.

It also occurs to me that while it makes sense to offer both for either type of outcome, verbose will probably be used more for errors, and basic more for annotations. Primarily because with annotations you are more likely to want them organized by the instance structure than the schema structure, but the schema structure makes more sense for errors.

gregsdennis Jun 27, 2022
Maintainer Author

with annotations you are more likely to want them organized by the instance structure

I also have an instance structure proposed in the OP. Let's discuss that in a new thread, though.

I'm putting up a PR that does the things in this thread shortly.

gregsdennis Jul 26, 2022
Maintainer Author

@TeCHiScy, I was re-reading json-schema-org/json-schema-spec#1241 and found

I'd appreciate if there's also a seperate keyword

Do you feel the format described in this thread adequately provides this? (Specifically, the errors and annotations property values are keyword-keyed objects.)

handrews · 2022-06-17T21:37:12Z

handrews
Jun 17, 2022

Here's a minor point that may or may not need anything done about it.

Technically, the instance location for a propertyNames subschema would not be /myObject/myProperty but rather the combination of the JSON Pointer /myObject/myProperty and the Relative JSON Pointer 0# (which takes the name, in this case myProperty).

As I understand it, currently we just use /myObject/myProperty and rely on the definition of propertyNames for people to understand that that really means the string "myProperty", which of course is visible right there anyway.

Has this been a concern for anyone? If not, it might be a good idea to include a note acknowledging that for some keywords, the instance location is not identical to how the pointer resolves. Unless folks want to try to do something to express that with the relative pointer, but that adds some complications as it's only 0 and 0# that would be valid.

2 replies

gregsdennis Jun 21, 2022
Maintainer Author

This kind of thing hasn't been an issue for me. I think it's pretty well understood with just the pointer. It's good that you brought it up, though. I wonder if anyone knows of a user perspective on this.

handrews Jun 21, 2022

Yeah, I can't say it's been a problem for me either, I just noticed the inconsistency. We can just leave it here to see if anyone else says anything, or I could split it out to a new discussion. I don't have strong feelings either way on this.

gregsdennis · 2022-06-27T09:00:37Z

gregsdennis
Jun 27, 2022
Maintainer Author

Preference (or maybe practical) question: In output, is it useful to have explicit errors from subschemas where the only failure is from a subschema?

For example

{
  "type": "object",
  "properties": {
    "foo": { "type": "array", "items": { "type": "integer" } }
  }
}

{ "foo": [ 1, 2, false, 4 ] }

This will fail on /foo/2 and the subschema at /properties/foo/items will produce an appropriate error (e.g. "Expected an integer"). Is it useful to receive an error from /properties/foo that basically just says, "A subschema failed validation"?

I can see the boolean schema false needing to produce an error, but I'm having a hard time conceiving of another case where a schema could produce an error.

0 replies

handrews · 2022-07-27T01:18:18Z

handrews
Jul 27, 2022

From json-schema-org/json-schema-spec#1249:

Can we mandate consistent use of JSON Pointer fragments [in schemaLocation] instead [of "canonical IRIs"]? I believe we changed the wording so that "canonical IRI" is only relevant to whole resources (no fragments). So really it would be canonical IRI plus JSON Pointer fragment.

Basically, if you use the basic output format, it's most convenient if every schemaLocation URI consistently uses JSON Pointer fragments (even if empty) relative to the immediate containing resource's canonical IRI. This allows sorting them lexicographically as strings to be a decent proxy for the schema structure. Plus, if you parse the JSON Pointer into components, you can also sort array elements numerically.

This doesn't work if schemaLocation is ever recorded using a plain name fragment. If that happens, you have to go search the schema for it if it matters where it came from. If folks are following best practices, then any schema that gets an $anchor would be under $defs anyway, but people do weird things sometimes.

Whether schema resource roots have an empty fragment or no fragment is less important, but it's trivial to include an empty fragment in that case and then all of your schemaLocation IRIs are consistent, which is nice.

Mandating JSON Pointer fragments is (as far as I can tell) an easy way to lock down a possible source of variance in our standard output.

@gregsdennis let me know if you'd like me to make this its own discussion. I figured it should at least be mentioned here anyway.

0 replies

gregsdennis · 2022-08-30T01:06:03Z

gregsdennis
Aug 30, 2022
Maintainer Author

I think most everything major that I currently want to address here is done. There are a few smaller things:

There's still not an instance-based structure. With the schema-based (instead of keyword-based) output unit approach, instance structuring doesn't really make sense. I stared at it for a good while and couldn't make it work. Given that the results in the basic format can just be sorted by instance location, I think this is fine.
keyword isn't an explicit property in the output unit, but the keyword that generated a error or annotation will be present as a key in the errors or annotations property. I think this suffices to meet the purpose of this proposal.
dialect is still missing (cc: @jdesrosiers), but I think this idea still needs some work. It's not really incorporated into the schema overall. I think we wanted to see something augmented to vocabularies to support this idea.
Suggesting or even requiring that an implementation provide a default base URI for schemas for which a base URI cannot be determined hasn't been included. I'll open a separate issue for this, and we can discuss it there.
Moving the output to its own spec is currently undecided. I'll open an issue for this as well.

I think this leaves this discussion in a good place. I'm going to lock it and open issues for the things I've noted.

Thanks, everyone, for all of the input!

0 replies

Restructuring Output Formats #63

gregsdennis Oct 7, 2021 Maintainer

Annotations

1. Does the keyword provide validation?

2./3. Does the keyword produce and/or collect annotations?

Examples for existing output formats

Flag (no change) 🎉

Basic

Detailed (schema-based)

Verbose (schema-structured)

Instance-based structure examples

Verbose (instance-based)

Detailed (instance-based)

Replies: 12 comments · 86 replies

jdesrosiers Oct 8, 2021 Maintainer

gregsdennis Oct 8, 2021 Maintainer Author

jdesrosiers Oct 8, 2021 Maintainer

gregsdennis Oct 8, 2021 Maintainer Author

gregsdennis Oct 8, 2021 Maintainer Author

karenetheridge Oct 9, 2021 Maintainer

gregsdennis Oct 10, 2021 Maintainer Author

jdesrosiers Oct 12, 2021 Maintainer

karenetheridge Oct 9, 2021 Maintainer

gregsdennis Jun 23, 2022 Maintainer Author

gregsdennis Jul 27, 2022 Maintainer Author

jdesrosiers Jul 28, 2022 Maintainer

gregsdennis Jul 28, 2022 Maintainer Author

karenetheridge Oct 9, 2021 Maintainer

gregsdennis Jul 31, 2022 Maintainer Author

jdesrosiers Aug 1, 2022 Maintainer

jdesrosiers Aug 2, 2022 Maintainer

gregsdennis May 17, 2022 Maintainer Author

gregsdennis Jul 27, 2022 Maintainer Author

jdesrosiers Jul 28, 2022 Maintainer

gregsdennis Aug 4, 2022 Maintainer Author

jdesrosiers Aug 4, 2022 Maintainer

gregsdennis May 30, 2022 Maintainer Author

karenetheridge May 31, 2022 Maintainer

gregsdennis May 31, 2022 Maintainer Author

gregsdennis Jun 16, 2022 Maintainer Author

gregsdennis Jun 17, 2022 Maintainer Author

gregsdennis Jun 27, 2022 Maintainer Author

gregsdennis Jun 27, 2022 Maintainer Author

gregsdennis Jun 27, 2022 Maintainer Author

gregsdennis Jul 26, 2022 Maintainer Author

gregsdennis Jun 21, 2022 Maintainer Author

gregsdennis Jun 27, 2022 Maintainer Author

gregsdennis Aug 30, 2022 Maintainer Author

gregsdennis
Oct 7, 2021
Maintainer

Replies: 12 comments 86 replies

jdesrosiers
Oct 8, 2021
Maintainer

gregsdennis Oct 8, 2021
Maintainer Author

jdesrosiers
Oct 8, 2021
Maintainer

gregsdennis Oct 8, 2021
Maintainer Author

gregsdennis Oct 8, 2021
Maintainer Author

karenetheridge Oct 9, 2021
Maintainer

gregsdennis Oct 10, 2021
Maintainer Author

jdesrosiers Oct 12, 2021
Maintainer

karenetheridge
Oct 9, 2021
Maintainer

gregsdennis Jun 23, 2022
Maintainer Author

gregsdennis Jul 27, 2022
Maintainer Author

jdesrosiers Jul 28, 2022
Maintainer

gregsdennis Jul 28, 2022
Maintainer Author

karenetheridge
Oct 9, 2021
Maintainer

gregsdennis Jul 31, 2022
Maintainer Author

jdesrosiers Aug 1, 2022
Maintainer

jdesrosiers Aug 2, 2022
Maintainer

gregsdennis
May 17, 2022
Maintainer Author

gregsdennis Jul 27, 2022
Maintainer Author

jdesrosiers Jul 28, 2022
Maintainer

gregsdennis Aug 4, 2022
Maintainer Author

jdesrosiers Aug 4, 2022
Maintainer

gregsdennis May 30, 2022
Maintainer Author

karenetheridge May 31, 2022
Maintainer

gregsdennis May 31, 2022
Maintainer Author

gregsdennis
Jun 16, 2022
Maintainer Author

gregsdennis
Jun 17, 2022
Maintainer Author

gregsdennis Jun 27, 2022
Maintainer Author

gregsdennis Jun 27, 2022
Maintainer Author

gregsdennis Jun 27, 2022
Maintainer Author

gregsdennis Jul 26, 2022
Maintainer Author

gregsdennis Jun 21, 2022
Maintainer Author

gregsdennis
Jun 27, 2022
Maintainer Author

gregsdennis
Aug 30, 2022
Maintainer Author