Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaschema: error reporting is difficult to understand #1021

Closed
alanisaac opened this issue Apr 8, 2024 · 6 comments · Fixed by #1027
Closed

Metaschema: error reporting is difficult to understand #1021

alanisaac opened this issue Apr 8, 2024 · 6 comments · Fixed by #1027
Labels
bug Something isn't working enhancement New feature or request metaschema non_breaking Non Breaking, backwards compatible changes

Comments

@alanisaac
Copy link
Contributor

Background

With the changes in 3c7ca92, the metaschema was updated to refactor it and make it a bit easier to read. However, this had the side effect of making the error messages more difficult to understand.

To Reproduce

Take the existing schema. Find an event with any attribute that is marked group: primary and update it so that it also has requirement: optional, which is not allowed. Run the validator over the schema, and you will see a message like the following:

TESTING: JSON files match their metaschema definitions
   FATAL: File at events/network/tunnel_activity.json does not pass metaschema validation. Error: Unevaluated properties are not allowed ('attributes', 'caption', 'description', 'extends', 'name', 'profiles' were unexpected) at JSON path: '$'

This error message does not point to the issue with the schema.

Notes

I believe this is occurring because the anyOf constraint in the event metaschema is not matching the common event schema"$ref": "common-event-object.schema.json". Since it does not match, that entire component of the schema is thrown out, and then all attributes from it are treated as unevaluated.

@alanisaac alanisaac added bug Something isn't working non_breaking Non Breaking, backwards compatible changes metaschema labels Apr 8, 2024
@alanisaac
Copy link
Contributor Author

Also cc: @rmouritzen-splunk

@rmouritzen-splunk rmouritzen-splunk added enhancement New feature or request and removed bug Something isn't working labels Apr 8, 2024
@rmouritzen-splunk
Copy link
Contributor

rmouritzen-splunk commented Apr 8, 2024

The change mentioned was done to make the metaschema more accurate, not more readable. Prior to the change, it was allowing extra properties in certain cases. Changing the metaschema back would be a regression.

About the error message, this is the error messages coming from the jsonschema Python library (https://pypi.org/project/jsonschema/). The library is correctly flagging an error, though. So that's good, at least. From a quick web search, it seems like this is because unevaluatedProperties is a more recent addition to JSON Schema, (though it's from 2019... so about 5 years ago).

The anyOf with unevaluatedProperties solution comes from this Stack Overflow answer: https://stackoverflow.com/questions/73843324/how-to-merge-two-json-schema-without-additional-properties.

Paths forwards:

  1. Use a different library.
  2. Submit a bug to the jsonschema library.
  3. Change the metaschema JSON Schema so it both flags unexpected properties and gives good error messages. This probably means getting rid of common-event-object.schema.json and explicitly copying these definitions in to the event and object schema definitions. While possible, the duplication creates a maintenance problem.

Suggestions are welcome.

I changed this from a bug to an enhancement since the incorrect JSON is properly being flagged as an error, even though the error message isn't pointing to the root problem.

@pagbabian-splunk
Copy link
Contributor

Also, the rule for Primary and Required can only go in one direction without it being a breaking change:
i.e. A Required attribute must be Primary, but a Primary attribute may be Required or Recommended. This is because any existing Primary attributes that were not Required cannot be promoted to Required without being a breaking change.
Also, Recommended attributes that are part of a just_one or at_least_one constraint can sometimes be substituted for a single attribute being Required.
For any previous attributes that were Primary and Optional, they should be promoted to Recommended (with, or without a constraint, since technically imposing a constraint, in some cases, may also be breaking).

@alanisaac
Copy link
Contributor Author

The change mentioned was done to make the metaschema more accurate, not more readable. Prior to the change, it was allowing extra properties in certain cases. Changing the metaschema back would be a regression.

Thanks for clarifying the intent! I don't think we should change it back.

I changed this from a bug to an enhancement since the incorrect JSON is properly being flagged as an error, even though the error message isn't pointing to the root problem.

I categorized it as a bug because I consider the ability of the validator to guide folks to the root of the problem an important part of what it does. In the previous version, using the same reproduction steps it would flag the problem as being with the requirement of the attribute in question. So even though the validation is better (which is great, and I'm not trying to say otherwise!), I considered it a regression in its reporting.

All that said, the label probably doesn't matter all that much.

Change the metaschema JSON Schema so it both flags unexpected properties and gives good error messages. This probably means getting rid of common-event-object.schema.json and explicitly copying these definitions in to the event and object schema definitions. While possible, the duplication creates a maintenance problem.

As I recall that was the reason I tried this "redefinition" strategy with event

"@deprecated": true,
"description": true,
"caption": true,
"name": true,
"extends": true,
"constraints": true,
"profiles": true,
"attributes": true,
, though you are correct that it left object unconstrained.

But maybe there's a better way? If worst comes to worst, I would trade the duplication for the error messages, personally. With any luck, the validator should be helpful in flagging issues for schema developers far more often than we modify the metaschema.

@rmouritzen-splunk
Copy link
Contributor

rmouritzen-splunk commented Apr 9, 2024

If this is problematic for your use-case, we can add an explicit check Python code. There are plenty of those already.

These checks happen in addition to the metaschema JSON Schema validation (as you may know), so there's no reason we can't do both.

If you can highlight the case(s) that you think would really help on top of the metaschema check, we'll get those added.

Edit: I'll play with this today, trying the Alan's solution and anything else I can think of.

@rmouritzen-splunk
Copy link
Contributor

@alanisaac: Your PR does the trick. I say we use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request metaschema non_breaking Non Breaking, backwards compatible changes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants