-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add keyword id to output unit #1065
Comments
We should probably properly define a way to identify a dialect first. From 2020-12:
It has no specific identifier, but perhaps the meta-schema
The reason it's not required is because the relative location is sufficient when the navigated path contains no references. That said, maybe it's sufficient to include the
I do like the idea of including something in the output, though. |
I think identifying a dialect is sufficiently defined.
Let's not get into the choice to make
A vocabulary identifier is certainly an equivalent alternative to a dialect identifier, but I'd still prefer a keyword level identifier so we don't have a new version of every keyword for every draft even for the keywords that don't change. That would be a big help for annotation related tooling that I was working on a while ago (and hope to get back to soon). I would prefer to always include this information rather than only when the dialect changes. Consider the case when I pass off my output to a third party library for processing. That third party library shouldn't need to know anything about the original schema and wouldn't know what dialect to process the schema as if it isn't in the output, even if the dialect doesn't change. You could pass the library a default dialect to use, but it would be easier for everyone involved if that information is just always there. |
@jdesrosiers just make the keyword name a fragment on the vocabulary URI. We can declare the resources to be of media type (We should also really register |
This should be mentioned over in #973. |
That's exactly how I identify keywords now and it leaves me with the problem I described in my previous comment. {
"$id": "https://example.com/contrived-example",
"allOf": [
{
"$id": "/foo",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Foo"
},
{
"$id": "/bar",
"$schema": "https://json-schema.org/draft/2019-09/schema",
"title": "Bar"
},
{
"$id": "/baz",
"$schema": "https://example.com/my-dialect",
"title": "Baz"
}
]
} In this example, the If each keyword has an identifier that doesn't change across drafts, we can avoid this problem. The Of course it could be difficult to determine when an new keyword identifier should be minted. For example, if we changed |
This sounds like a bigger issue than just in the output. Sure, the output is probably the primary benefactor, but it sounds like this is more about adding a property to vocabularies (maybe in line with @handrews' idea of a machine-readable vocab file) than extending output specifically. |
@gregsdennis I think I see what you mean. I'm open to discussing alternatives that also solve this problem. |
This was also mentioned here. I think restructuring the output based on subschemas instead of keywords solves this (related comment). |
@jdesrosiers can you confirm whether your concerns here were covered in #1249? |
No, I don't see how those changes address this problem. There's still no indication in the output unit of what dialect or vocab the keyword is defined in. |
@jdesrosiers @gregsdennis can we for now settle on the dialect IRI as the thing to be added? Or if we want this issue to be about a more comprehensive identification (dialect+vocab+keyword), can we split out an issue for just dialect identification? |
I question the need for dialect identification in the output. The schema doesn't declare individual keywords by their vocab; it just uses the keyword. For example, there is no If you really need to know what vocab a keyword came from, that information is traceable by going back to the schema that produced the output and looking at its meta-schema. I don't see the value in having that information in every instance of the output, especially not for every listing of a keyword.
There is: From the examples in the current doc: {
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-schema.org/schemas/example#",
"instanceLocation": "",
"annotations": {
"title": "root",
"properties": [
"foo",
"bar"
]
}
} Properties are given here. Schema location is given here. You can look up the dialect if you want. You may have identifiers for keywords in your implementaiton, but JSON Schema defines no such identifiers. I see no reason why the output should include identifiers for keywords when no such definition exists. |
If I call an API that, somewhere on the back end, uses JSON Schema to evaluate some data and produce annotations, and then sends that annotation data back to me without sending me the schema, no, I can't just look up the dialect. This is a common problem, and is why resources typically have "self" links even though the self link URI is probably the URI that you just got it from. Because sometimes you end up with a resource representation with no way to tell where it came from. It's also why it's a good idea to have an absolute We do not need the dialect URI in every output unit. We just need a map of the absolute URIs (without fragments) in |
Can someone please post an example of how they want this fit into the output? |
Part of the proposal was to define URIs to identify each keyword. That would definitely be a requirement for adding a keyword URI in the output. I still think this is the best solution in the long run because it would allow output format post-processors to know that one keyword has the same semantics as another. For example, someone could write a dialect that has the same semantics as the standard dialect except all the keywords are translated to Spanish. Tools that can process the standard dialect output should be able to process the Spanish dialect output as well because the semantics of the keywords should be the same. Of course this would require changes to the vocabulary system in addition to just defining keyword URIs, but I think this would be a good path to be on.
I mention in the issue description why I didn't think that was a good enough solution and @handrews reiterated that view point.
Honestly I haven't had a chance to fully think through how this would fit in to the new output format. We could figure out how to incorporate a dialect ID into what we have now, but I think that would be a partial solution. I'd like to explore incorporating the idea of keyword URIs to the vocabulary system first and then see what that means for the output format. We can probably just put this issue on hold for now. |
This is almost exactly the example in your recent PR, except I changed the output units with an evaluation path prefix of {
"valid": true,
"dialects": {
"https://json-schema.org/schemas/example": "https://json-schema.org/draft/2020-12/schema",
"https://json-schema.org/schemas/other": "https://json-schema.org/draft-07/schema"
},
"nested": [
{
"valid": true,
"evaluationPath": "",
"schemaLocation": "https://json-schema.org/schemas/example#",
"instanceLocation": "",
"annotations": {
"title": "root",
"properties": [
"foo",
"bar"
]
}
},
{
"valid": true,
"evaluationPath": "/properties/foo/allOf/1",
"schemaLocation": "https://json-schema.org/schemas/example#/properties/foo/allOf/1",
"instanceLocation": "/foo",
"annotations": {
"title": "foo-title",
"properties": [
"foo-prop"
],
"additionalProperties": [
"unspecified-prop"
]
}
},
{
"valid": true,
"evaluationPath": "/properties/bar/$ref",
"schemaLocation": "https://json-schema.org/schemas/other#/$defs/bar",
"instanceLocation": "/bar",
"annotations": {
"title": "bar-title",
"properties": [
"bar-prop"
]
}
},
{
"valid": true,
"evaluationPath": "/properties/foo/allOf/1/properties/foo-prop",
"schemaLocation": "https://json-schema.org/schemas/example#/properties/foo/allOf/1/properties/foo-prop",
"instanceLocation": "/foo/foo-prop",
"annotations": {
"title": "foo-prop-title"
}
},
{
"valid": true,
"evaluationPath": "/properties/bar/$ref/properties/bar-prop",
"schemaLocation": "https://json-schema.org/schemas/other#/$defs/bar/properties/bar-prop",
"instanceLocation": "/bar/bar-prop",
"annotations": {
"title": "bar-prop-title"
}
}
]
} |
@gregsdennis the above example is only for the dialect IRI. I'm not currently concerned with anything more granular (hence me asking if we can either agree to just do this, or split this out to its own issue if folks want to keep working on the more granular thing here). So now if I want to know the dialect for that last output unit, I look at |
🤦 I initially copy-pasted the diff instead of the changed file, so the example two comments above was a mess. It's fixed now. |
I think that's doable, but defining how dialects are declared seem to still be an open topic on Keyword for identifying bootstrapping rules #217. We should nail that down before deciding how to put it in the output. |
@gregsdennis I am working on a replacement proposal for #217, but that depends on how some things regarding the SDLC play out. For now, the important part is to get it into the output format. We can update the source of the dialect IRI as needed as we change that. If we change it. |
My point is that we don't have the "dialect IRI" defined as a concept. We seem to be using the meta-schema IRI, but is that really the same thing? It doesn't make sense to include something that isn't defined yet. |
Yes, it's definitely the same thing. I thought that was made clear in the spec. If not, that's just an oversight. |
Okay. I still want to hold this open until process is defined. Less to think about. |
Coming back to this, the idea has warmed on me. I like the idea of the root output unit not actually being an output unit but a host for other evaluation data with the actual evaluation results in the From this comment and the one that follows it, I think it makes sense to have keywords identified. This aligns with the ideas to resolve keyword collisions (also from that discussion) in that their output need to be reported somehow. I imagine output will go through some more refinement soon. |
@jdesrosiers would it make sense to only include |
It could work either way, but I would include it in every node. I feel like if I was writing code to process this output, it would be annoying if I had to look in two places to determine the dialect. It would be easier if it was just always in the same place when it's needed. |
Yeah, thinking about it again, it would only make any sense at all to omit it in some nodes for the hierarchical format. The list format would still need it in every node. |
A schema using dialect-a can
$ref
a schema using dialect-b but, there is nothing in the output format that indicates the dialect that was used to determine each output unit result. This can be important for keywords whose semantics have changed over time, but their name hasn't changed. The ones that come to mind immediately aremaximum
/exclusiveMaximum
,minimum
/exclusiveMinimum
,format
, andlinks
.We could potentially determine the dialect for an output unit by using the
absouteKeywordLocation
to lookup the schema and inspect it's$schema
property, but that has two problems. First,absoluteKeywordLocation
is not required. Second, if the output is passed to a third party for processing, it might not have access to the original schema to look up it's dialect.The simplest approach would be to add a field such as
dialect
to the output unit. However, It would be more useful to have a unique identifier for each keyword. That way, when we introduce a new draft, if a keyword doesn't change, it's ID doesn't have to change. That would make it easier for tools to do things with annotations where they don't care what dialect they are as long as they have the same semantics (such astitle
anddescription
).The text was updated successfully, but these errors were encountered: