Restructuring Output Formats #63
Replies: 12 comments 86 replies
-
I started writing up some comments on some of these issues and it occurred to me that with a bunch of different threads going on at once, nothing is going to get done. So, I propose that we start with the things that don't seem controversial, make sure we have consensus, get it written up, and then move on to the next most important thing. I'll add a thread with the first set of proposals and a consensus measuring mechanism. (Feel free to tell me you don't like this approach. I won't be offended) |
Beta Was this translation helpful? Give feedback.
-
For this first proposal, we're bundling several changes at once because they appear to be relatively uncontroversial. If changes to this proposal are necessary based on discussion, this comment will be edited with a change log included.
Change Log 10/12: Change proposed new name for Use the following legend to show your support for this proposal. Remember this isn't a vote, it's a general consensus measuring device.
|
Beta Was this translation helpful? Give feedback.
-
I would change
Regarding which of these should be required, perhaps we can just specify that at least one or the other should be included, and both MUST be included if a We also shouldn't imply that only
This has been the case since draft2019-09, although some of the examples there mistakenly added a |
Beta Was this translation helpful? Give feedback.
-
I've always disliked this being a hard requirement, because some schemas are very small and used inlined in code, and therefore have no "location" to speak of. e.g. one might want to use a small json schema to validate the structure of nested data passed in as a function argument, so the base URI of The spec is clear enough -- the initial base URI is defined according to its context, and implementations should document anything they assume. |
Beta Was this translation helpful? Give feedback.
-
Thinking about this, supporting multiple output formats sounds like a real pain as an implementor. I'd like to consider this alternative:
|
Beta Was this translation helpful? Give feedback.
-
as described in #Proposal {
"valid": false,
"keyword": "minLength",
"keywordLocaation": "#/description",
"xxx": {
"received": 1,
"limit": 3
},
"schemaLocation": "https://example.com/mySchema#/description",
"instanceLocation": "",
"annotation": "description"
} Here |
Beta Was this translation helpful? Give feedback.
-
Maybe I've been thinking about this wrong from the start. What if we only have an output unit per schema/subschema instead of per keyword? The output unit would need to be adjusted to house errors & annotations from keywords. The detailed output would go from {
"valid": false,
"validationPath": "",
"schemaLocation": "https://example.com/polygon#",
"instanceLocation": "",
"nested": [
{
"valid": false,
"keyword": "$ref",
"dialect": "https://json-schema.org/draft/2020-12/vocab/core",
"validationPath": "/items/$ref",
"schemaLocation": "https://example.com/polygon#/$defs/point",
"instanceLocation": "/1",
"nested": [
{
"valid": false,
"keyword": "required",
"dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
"validationPath": "/items/$ref/required",
"schemaLocation": "https://example.com/polygon#/$defs/point/required",
"instanceLocation": "/1",
"error": "Required property 'y' not found."
},
{
"valid": false,
"keyword": "additionalProperties",
"dialect": "https://json-schema.org/draft/2020-12/vocab/applicator",
"validationPath": "/items/$ref/additionalProperties",
"schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
"instanceLocation": "/1/z",
"error": "Additional property 'z' found but was invalid."
}
]
},
{
"valid": false,
"keyword": "minItems",
"dialect": "https://json-schema.org/draft/2020-12/vocab/validation",
"validationPath": "/minItems",
"schemaLocation": "/minItems",
"instanceLocation": "",
"error": "Expected at least 3 items but found 2"
}
]
} to just {
"valid": false,
"evaluationPath": "",
"schemaLocation": "https://example.com/polygon#",
"instanceLocation": "",
"errors": {
"minItems": "Expected at least 3 items but found 2"
},
"nested": [
{
"valid": false,
"evaluationPath": "/items/$ref",
"schemaLocation": "https://example.com/polygon#/$defs/point",
"instanceLocation": "/1",
"errors": {
"required": "Required property 'y' not found.",
"additionalProperties": "Additional property 'z' found but was invalid."
}
}
]
} This seems a LOT simpler, and all the information you need is still there.
For completeness, here's something of what a valid result would like with annotations: {
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://example.com/polygon#",
"instanceLocation": "",
"annotations": {
"items": true
},
"nested": [
{
"valid": true,
"evaluationPath": "/items/$ref",
"schemaLocation": "https://example.com/polygon#/$defs/point",
"instanceLocation": "/1",
"annotations": {
"properties": [ "x", "y" ]
}
}
]
} |
Beta Was this translation helpful? Give feedback.
-
(Ugh... this was supposed to be part of the previous thread...) For the above example, because the nesting is only one deep, the Schema {
"type": "object",
"title": "root",
"properties": {
"foo": {
"type": "object",
"title": "foo-title",
"properties": {
"foo-prop": {
"const": 1,
"title": "foo-prop-title"
}
}
},
"bar": {
"type": "object",
"title": "bar-title",
"properties": {
"foo-prop": {
"const": 2,
"title": "bar-prop-title"
}
}
}
}
} Instance {
"foo": {"foo-prop": 1},
"bar": {"bar-prop": 2}
} Detailed output {
"valid": true,
"evaluationPath": "",
"instanceLocation": "",
"annotations": {
"title": "root",
"properties": ["foo", "bar"]
},
"nested": [
{
"valid": true,
"evaluationPath": "/properties/foo",
"instanceLocation": "/foo",
"annotations": {
"title": "foo-title",
"properties": ["foo-prop"]
},
"nested": [
{
"valid": true,
"evaluationPath": "/properties/foo/properties/foo-prop",
"instanceLocation": "/foo/foo-prop",
"annotations": {
"title": "foo-prop-title"
}
}
]
},
{
"valid": true,
"evaluationPath": "/bar",
"instanceLocation": "/bar/bar-prop",
"annotations": {
"title": "bar-title",
"properties": ["bar-prop"]
},
"nested": [
{
"valid": true,
"evaluationPath": "/properties/bar/properties/bar-prop",
"instanceLocation": "/bar/bar-prop",
"annotations": {
"title": "bar-prop-title"
}
}
]
}
]
} Basic output (same, but flattened, wrapped into the {
"valid": true,
"nested": [
{
"valid": true,
"evaluationPath": "",
"instanceLocation": "",
"annotations": {
"title": "root",
"properties": ["foo", "bar"]
}
},
{
"valid": true,
"evaluationPath": "/properties/foo",
"instanceLocation": "/foo",
"annotations": {
"title": "foo-title",
"properties": ["foo-prop"]
}
},
{
"valid": true,
"evaluationPath": "/properties/foo/properties/foo-prop",
"instanceLocation": "/foo/foo-prop",
"annotations": {
"title": "foo-prop-title"
}
},
{
"valid": true,
"evaluationPath": "/bar",
"instanceLocation": "/bar/bar-prop",
"annotations": {
"title": "bar-title",
"properties": ["bar-prop"]
}
},
{
"valid": true,
"evaluationPath": "/properties/bar/properties/bar-prop",
"instanceLocation": "/bar/bar-prop",
"annotations": {
"title": "bar-prop-title"
}
}
]
} You can see that with the basic output, you can still do all of your sorting as needed. The only difference between this and the current output is that we don't have output units for keywords; the info is consolidated at the subschema level. I think this makes more sense. It also aligns with the idea that, upon validation failure, annotations are only discarded at the subschema boundary rather than by individual keywords, which is another discussion we've had recently (also prompted by @handrews). |
Beta Was this translation helpful? Give feedback.
-
Here's a minor point that may or may not need anything done about it. Technically, the instance location for a As I understand it, currently we just use Has this been a concern for anyone? If not, it might be a good idea to include a note acknowledging that for some keywords, the instance location is not identical to how the pointer resolves. Unless folks want to try to do something to express that with the relative pointer, but that adds some complications as it's only |
Beta Was this translation helpful? Give feedback.
-
Preference (or maybe practical) question: In output, is it useful to have explicit errors from subschemas where the only failure is from a subschema? For example {
"type": "object",
"properties": {
"foo": { "type": "array", "items": { "type": "integer" } }
}
}
{ "foo": [ 1, 2, false, 4 ] } This will fail on I can see the boolean schema false needing to produce an error, but I'm having a hard time conceiving of another case where a schema could produce an error. |
Beta Was this translation helpful? Give feedback.
-
From json-schema-org/json-schema-spec#1249:
Basically, if you use the basic output format, it's most convenient if every This doesn't work if Whether schema resource roots have an empty fragment or no fragment is less important, but it's trivial to include an empty fragment in that case and then all of your Mandating JSON Pointer fragments is (as far as I can tell) an easy way to lock down a possible source of variance in our standard output. @gregsdennis let me know if you'd like me to make this its own discussion. I figured it should at least be mentioned here anyway. |
Beta Was this translation helpful? Give feedback.
-
I think most everything major that I currently want to address here is done. There are a few smaller things:
I think this leaves this discussion in a good place. I'm going to lock it and open issues for the things I've noted. Thanks, everyone, for all of the input! |
Beta Was this translation helpful? Give feedback.
-
The below is a summary of json-schema-org/json-schema-spec#973.
The topic has been moved here so that we can utilize threads to help more targeted discussions.
The working branch is draft-next-output.
Hey everyone. I've been reading through all of the comments in this and linked issues, compiling a list of topics.
It seems that we have consensus on these things:
errors
tonested
/subschemas
/inner
/details
(other suggestions?)$false
in the location)valid
is required for each node.error
is not required but is declared as the designated place to put error messaging. Tooling that consumes this output should expect an error message here.These things still require decisions (but I've proposed some things):
absoluteKeywordLocation
instead ofkeywordLocation
keywordLocation
could bevalidationPath
and optional (default off) to the end user; opens the door forabsoluteKeywordLocation
to change toschemaLocation
(for terseness); implementations SHOULD support an option mechanism which determines the inclusion ofvalidationPath
.keyword
to support renaming ☝️ (and maybedialect
- import from #1065). Only makes sense when a keyword is present. For instance at the root or#/allOf/3
there is no keyword, so these would be absent.validationPath
andinstanceLocation
can be plain JSON Pointers and do not need to be URI-encodedbasic
anddetailed
formats? My users have said no. (verbose
should show everything.)detailed
andverbose
to*-by-schema
and add*-by-instance
(or some other way to indicate both structure and quantity of info) to support instance-based structures (see below for examples)basic
is a collection of terminal nodesdetailed-by-schema
not
)detailed-by-instance
I'll discuss later.flag
to be specified as "MUST" (is currently "SHOULD"); supporting at least one other format to be specified as "SHOULD".And this isn't directly about output, but supports it:
absoluteSchemaLocation
/schemaLocation
. This may be in the spec already, but I figured it's worth mentioning.Annotations
Annotations need to be part of the node, not rendered as child validation nodes. I don't think there's any argument here.
I've been thinking around this for a couple hours now, and it seems we have three orthogonal attributes that we need to consider.
1. Does the keyword provide validation?
Not all keywords provide validation. For example, the spec says nothing about validation for
title
anddescription
.My implementation handles this by always passing validation for these keyword, which works to provide the correct pass/fail outcome, but it also means that they produce an output node. So I get something like this as a node:
Question: should pure-annotation keywords produce an output node or simply be listed in parent nodes?
I can see arguments for both sides, and I think I have a pretty easy tweak to my implementation that would hide these, so I'm not invested either way.
2./3. Does the keyword produce and/or collect annotations?
These are actually two different ideas, but I think I often conflate them, so I'm going to consider them distinct here.
A keyword can produce annotations, and it can propagate annotations from its children. Most keywords do one or the other, but some do both.
title
anddescription
only produce annotations.allOf
which only propagate child annotations.properties
both produce (which of the listed properties are present) and propagate annotations.Regardless of how an implementation manages annotations internally, I believe it's worthwhile keeping these distinctions separate when reporting them to the user. It can be useful to know whether a keyword produced an annotation or is just passing a message.
I'd like to propose two more output node properties to cover each of these ideas:
annotation
property. This is just the raw value of the annotation (e.g. a string fortitle
or an array of strings forproperties
).collectedAnnotations
property. This is an object where the key is the schema location (i.e. the value that would be inschemaLocation
) of the keyword that produced it, and the value is the annotation itself.The downside to listing collected annotations is duplication of annotation values. But on the upside, you can:
Examples for existing output formats
These examples take the spec scenario and show what the output would look like if we changed all of this, including my suggestions on the open discussion topics.
For reference, here's the spec scenario:
Flag (no change) 🎉
Basic
Click to expand!
(I never really liked having a top-level node here, but I don't know how else to collect the nodes. I do like consistently having an object at the root, as opposed to an array.)
Detailed (schema-based)
Click to expand!
Verbose (schema-structured)
Click to expand! (this one's pretty big)
Note the inclusion of the annotations from the subschemas which passed validation and how they're no longer included once we move up through a subschema that failed validation.
Instance-based structure examples
I think I've been able to marry our two approaches. Considerations:
I do need to add a
results
property to contain the validation results that pertain to each location, andnested
as an object instead of an array made more sense here.Verbose (instance-based)
Click to expand!
Detailed (instance-based)
Initially I was only considering the verbose format for instance-based, but in building that example, I realized how to pare down the fluff.
The rules I'm following for this are:
validationPath
is contained within another unit'svalidationPath
, it (the first one) can be removed.This second rule removes pass-through nodes like the kind you'd get having just resolved a
$ref
and arriving at a new subschema. These nodes are just a summary of what's underneath and don't add much to the result.Click to expand!
Beta Was this translation helpful? Give feedback.
All reactions