Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior when attempting to *override* an official dialect is not well defined when recursing into subschemas containing the official dialect's $schema identifier #994

Open
WilliamJamieson opened this issue Aug 31, 2022 · 25 comments
Labels
Dialects v2 Issues which will likely be addressed as part of reworked dialect support Enhancement Some new desired functionality

Comments

@WilliamJamieson
Copy link

This is related to part of #981 (comment); namely, the broken validation of default values in asdf. However, I believe the issue itself originates in the evolve:

def evolve(self, **changes):
# Essentially reproduces attr.evolve, but may involve instantiating
# a different class than this one.
cls = self.__class__
schema = changes.setdefault("schema", self.schema)
NewValidator = validator_for(schema, default=cls)
for field in attr.fields(cls):
if not field.init:
continue
attr_name = field.name # To deal with private attributes.
init_name = attr_name if attr_name[0] != "_" else attr_name[1:]
if init_name not in changes:
changes[init_name] = getattr(self, attr_name)
return NewValidator(**changes)
method first introduced in version 4.10.0.

Essentially, we are creating a a new validator using the Draft4Validator, which contains an additional validators function (under the default keyword). See https://github.com/asdf-format/asdf/blob/bf954a3921df6b3e5f36c211c18257ac97d4f423/asdf/schema.py#L705-L750 in the asdf code. However, it appears that the additional default validator is being lost when attempting to validate something. The following should produce a ValidationError with the latest version of asdf:

from asdf import schema

s = {"type": "object", "properties": {"a": {"type": "integer", "default": "foo"}}}

schema.check_schema(s)

Indeed, if one has jsonschema 4.9.1 installed, one gets a ValidationError; however, if one has jsonschema 4.15.0 installed, one gets no validation error.

After carefully stepping the the code as it is running with jsonschema 4.15.0 I have found that at some point while iterating, the evolve method returns a validator class which does not have default in its VALIDATORS attribute when it should have one. Making it so that there is no validator available when it does try to validate default. This occurs when validator_for does not return DefaultValidator:

return _META_SCHEMAS.get(schema["$schema"], DefaultValidator)
Instead it returns the Draft4Validator which does not contain default in its VALIDATORS dictionary. Meaning the validator returned by evolve has now lost the default validator for further iterations.

@Julian
Copy link
Member

Julian commented Sep 1, 2022

Hi. Thanks. I'm not sure when I'll have time to dig through asdf, so it'd certainly help if you could simplify this to something that doesn't involve asdf by diagnosing which usage of this library leads to behavior you don't expect (e.g. perhaps by producing a self contained bit of code that has this behavior).

Otherwise I'm happy to leave this open and whenever I get a chance to try to help.

@WilliamJamieson
Copy link
Author

WilliamJamieson commented Sep 1, 2022

@Julian, sorry for not creating an asdf independent reproducer yesterday. The following is a reproducer of the issue, though I think it can be simplified:

import jsonschema.validators as mvalidators

def check_schema(meta_schema, schema, validate_default=True):
    validators = mvalidators.Draft4Validator.VALIDATORS.copy()

    if validate_default:
        def _validate_default(validator, default, instance, schema):
            # Do something drastic to demonstrate this function getting executed
            raise RuntimeError("This error should be raised")

        validators.update({"default": _validate_default})

    cls = mvalidators.create(
        meta_schema=meta_schema,
        validators=validators,
        type_checker=mvalidators.Draft4Validator.TYPE_CHECKER,
        id_of=mvalidators.Draft4Validator.ID_OF,
    )
    validator = cls(meta_schema)
    validator.validate(schema)


s = {"type": "object", "properties": {"a": {"type": "integer", "default": "foo"}}}
meta_schema = {
    "$schema": "http://json-schema.org/draft-04/schema",
    "allOf": [
        {"$ref": "http://json-schema.org/draft-04/schema"},
    ]
}

# Attempt to run the `_validate_default` validator
try: 
    check_schema(meta_schema, s)
except RuntimeError:
	# We want the error to be raised to demonstrate `_validate_default` being run.
    pass
else:
    raise RuntimeError("An Error should have been raised")

# Run again without the `_validate_default` validator
check_schema(meta_schema, s, False)

Under jsonschema 4.9.1 (and below to at least 4.0.1) this will run without issue. However in jsonschema 4.15.0 (and 4.10.0, though I have not tried any of the versions in between), this does fail with:

RuntimeError: An Error should have been raised

This shows that the _validate_default function is not being called by jsonschema even though it should be.

@WilliamJamieson
Copy link
Author

Note that changing

return NewValidator(**changes)

to

            result = NewValidator(**changes)
            result.VALIDATORS.update(self.VALIDATORS)
            return result

Will fix the first part (try/except block) of the reproducer, but causes the second run of check_schema (with _valdate_default off) to fail. Checking through the code I think _validate_default is persisting as a validator in the Draft4Validator.

@Julian
Copy link
Member

Julian commented Sep 1, 2022

Nice! No worries, that looks great, will have a look in the morning, thanks for it.

@WilliamJamieson
Copy link
Author

A better fix than the one in #994 (comment) is to replace

NewValidator = validator_for(schema, default=cls)
with

            BaseValidator = validator_for(schema, default=cls)
            NewValidator = extend(
                validator=BaseValidator,
                validators=self.VALIDATORS
            )

It seems to pass my through my reproducer but fails this unit test:

def test_evolve_with_subclass(self):
"""
Subclassing validators isn't supported public API, but some users have
done it, because we don't actually error entirely when it's done :/
We need to deprecate doing so first to help as many of these users
ensure they can move to supported APIs, but this test ensures that in
the interim, we haven't broken those users.
"""
with self.assertWarns(DeprecationWarning):
@attr.s
class OhNo(self.Validator):
foo = attr.ib(factory=lambda: [1, 2, 3])
_bar = attr.ib(default=37)
validator = OhNo({}, bar=12)
self.assertEqual(validator.foo, [1, 2, 3])
new = validator.evolve(schema={"type": "integer"})
self.assertEqual(new.foo, [1, 2, 3])
self.assertEqual(new._bar, 12)

This is a bit ironic, since I was one of the ones that raised the issue that prompted that unit test.

@WilliamJamieson
Copy link
Author

A better fix than the one in #994 (comment) is to replace

Better is relative, it appears to have created a recursion error (don't have time to explore this today) in a different (mostly unrelated) part of asdf, and caused asdf to benchmark significantly slower.

@Julian
Copy link
Member

Julian commented Sep 2, 2022

I have to look a bit closer, but part of what's here is expected behavior after the bugfix -- specifically your meta schema is saying it wants to follow the rules of draft 4 (by claiming its $schema is the draft 4 meta schema URI) -- but the rules of draft 4 indeed don't behave as you're expecting with default, instead you're trying to apply additional behavior, so that's not draft 4, it's your own custom dialect's behavior. So the solution will involve specifying some different metaschema identifier and then saying you want to apply that. But I have to look a bit closer to suggest exactly what. I'm also somewhat confused about why that code uses jsonschema.validators.create instead of jsonschema.validators.extend(Draft4Validator, ...), but maybe your real code really needs to for some reason?

@WilliamJamieson
Copy link
Author

The bit of asdf at issue is here. In asdf, we require all schemas to follow the our own "yaml-schema" meta-schema which is asdf's extension of the draft 4 meta-schema. Indeed, "yaml-schema" is supposed to be written in a way which makes it a valid schema under draft 4. Note that much of the relevant asdf code I stripped out was mostly the code necessary to guess (if needed) and load asdf's meta-schema into the python objects that jsonschema consumes. This means basically had to hand code asdf's meta-schema for the example to work, which resulted in me adding the minimum portions of the yaml-schema to recreate the issue. The reason for using create and not extend is that asdf is using its own meta-schema (which is a modification of draft 4) so we need to create a validator based off that.

@WilliamJamieson
Copy link
Author

Tagging @eslavich for comments about asdf.

@eslavich
Copy link

eslavich commented Sep 5, 2022

@WilliamJamieson I'm confused by this example:

#994 (comment)

The s object is the "data" and the meta_schema object is the "schema", right? If that's the case, then I don't see why the default validator would ever run -- there is no key named "default" in the meta_schema object, so there's no reason for that function to be called. The "default" key in the s object is irrelevant because that's just data.

@eslavich
Copy link

eslavich commented Sep 7, 2022

Here's a similar but I think more realistic example:

import jsonschema.validators as mvalidators

def check_schema(meta_schema, schema, data, validate_default=True):
    validators = mvalidators.Draft4Validator.VALIDATORS.copy()

    if validate_default:
        def _validate_default(validator, default, instance, schema):
            # Do something drastic to demonstrate this function getting executed
            raise RuntimeError("This error should be raised")

        validators.update({"default": _validate_default})

    cls = mvalidators.create(
        meta_schema=meta_schema,
        validators=validators,
        type_checker=mvalidators.Draft4Validator.TYPE_CHECKER,
        id_of=mvalidators.Draft4Validator.ID_OF,
    )
    validator = cls(schema)
    validator.validate(data)


data = {"a": 14}
s = {"type": "object", "properties": {"a": {"type": "integer", "default": "foo"}}}
meta_schema = {
    "$schema": "http://json-schema.org/draft-04/schema",
    "allOf": [
        {"$ref": "http://json-schema.org/draft-04/schema"},
    ]
}

# Attempt to run the `_validate_default` validator
try: 
    check_schema(meta_schema, s, data, True)
except RuntimeError:
	# We want the error to be raised to demonstrate `_validate_default` being run.
    pass
else:
    raise RuntimeError("An Error should have been raised")

# Run again without the `_validate_default` validator
check_schema(meta_schema, s, data, False)

It succeeds on all of jsonschema 4.9, 4.10, and 4.15.

@Julian
Copy link
Member

Julian commented Sep 7, 2022

Succeeds or fails? Are you saying it's working or broken?

@WilliamJamieson
Copy link
Author

WilliamJamieson commented Sep 7, 2022

@eslavich I think you have changed what is being validated. What is breaking is our validation of schema against its respective meta_schemas. You created some data which you are validating against the schema in question.

@gazpachoking
Copy link
Contributor

I'm having the same issue since 4.10. We are using this recipe to set default values while validating, which works great on 4.9. On 4.10+ though, it loses the special properties validator whenever it descends into $refs which declare $schema.

This behavior makes sense to me, the default filling was on top of the draft4 schema that we are using, (and that was declared in the $schema,) but I'm not sure what the best way to override this behavior is and force the custom default filling to descend through the $refs.

@Julian
Copy link
Member

Julian commented Oct 29, 2022

I can try to have another look at this next week

@gazpachoking
Copy link
Contributor

gazpachoking commented Oct 29, 2022

Well, I think I just figured a reasonable workaround. Declaring that our custom extended validator validates the default schema. Not sure if this should just be added to the docs, or if there is a better way.

    CURRENT_SCHEMA_VERSION = 'http://json-schema.org/draft-04/schema#'
    # Make a couple validators that have some extensions to the default draft4 validator
    SchemaValidator = jsonschema.validators.extend(jsonschema.Draft4Validator, validators)
    SchemaValidatorWDefaults = jsonschema.validators.extend(SchemaValidator, {'properties': validate_properties_w_defaults})
    ...
    # Right before using any of our custom validators, declare that they validate the default draft4 schema
    if set_defaults:
        jsonschema.validators.validates(CURRENT_SCHEMA_VERSION)(SchemaValidatorWDefaults)
        validator = SchemaValidatorWDefaults(schema, resolver=resolver, format_checker=format_checker)
    else:
        jsonschema.validators.validates(CURRENT_SCHEMA_VERSION)(SchemaValidator)
        validator = SchemaValidator(schema, resolver=resolver, format_checker=format_checker)
    # Use the validator

EDIT: We didn't want to declare a custom $schema uri, because our schemas get used by some external tools in Javascript world where the default behavior is fine. Also, because depending on the context, we sometimes enable the default filling, and sometimes not, as can probably be inferred from above snippet.

@WilliamJamieson
Copy link
Author

ASDF managed to resolve this issue with asdf-format/asdf#1203. Those changes maybe useful to resolving the issue.

@gazpachoking
Copy link
Contributor

I tweaked my solution a little bit to make sure that the validator that sets defaults is unregistered right after being used. It caused problems in other places in our code if it remained installed.

@Julian
Copy link
Member

Julian commented Oct 31, 2022

The "right" solution should likely involve using some new dialect URI (i.e. not 'http://json-schema.org/draft-04/schema#') since indeed your schemas are not using draft 4 semantics, but I'd have to spend a bit more time putting together some concrete code to see that that works.

@gazpachoking
Copy link
Contributor

The "right" solution should likely involve using some new dialect URI (i.e. not 'http://json-schema.org/draft-04/schema#') since indeed your schemas are not using draft 4 semantics, but I'd have to spend a bit more time putting together some concrete code to see that that works.

Yeah, I figured that was the 'right' solution, and I'm pretty sure it would work. It was just an inconvenient solution on our end, since we don't want to change the validation semantics, just add the defaults processing sometimes, and we are using separate libraries on JS side, that would also have to be taught about the new dialect URI. I'm okay with current behavior, since we can forcefully override the validator for a given dialect URI (even if that isn't a recommended solution.) Might look in to fixing it properly on our end later when I have time/motivation.

@jpmckinney
Copy link

jpmckinney commented Mar 16, 2023

The documentation for extend mentions:

Any validator callables with the same name as an existing one will (silently) replace the old validator callable entirely, effectively overriding any validation done in the “parent” validator class.

I have been using extend to replace some validator callables with my own (to address #119). However, as reported by others, the behavior described above (replacement) doesn't occur for $referenced schema.

Note that the semantics of my callables are exactly the same as draft 4 semantics - my callables are just adding a few extra attributes to validation errors, so that I can make better error messages (viz #119).

I don't use create, so I can't use the solution in asdf-format/asdf#1203 without major refactoring.

@gazpachoking's solution in #994 (comment) works, though I agree this is a kind of weird workaround.

@jpmckinney
Copy link

jpmckinney commented Mar 16, 2023

To reproduce:

import jsonschema

def validate_type(validator, value, instance, schema):
    raise Exception("hit")

NewValidator = jsonschema.validators.extend(jsonschema.Draft4Validator, {"type": validate_type})

validator = NewValidator({
    "$schema": "http://json-schema.org/draft-04/schema#",
    # Note: The referenced schema has "$schema": "http://json-schema.org/draft-04/schema#".
    "$ref": "https://standard.open-contracting.org/schema/1__1__5/release-schema.json"
})

for error in validator.iter_errors({"ocid": 0, "id": "1", "date": "", "tag": ["tender"], "initiationType": "tender"}):
    # No exception raised.
    print(error.validator)

Before jsonschema 4.10, an exception would be raised (i.e. validate_type would be run).

I don't see a scenario where a user would want to override the validators for a top-level schema, but not the validators for referenced schema. If I'm using extend to "subclass" the Draft4Validator, I want that new validator to be used for all draft 4 schema referenced by the top-level schema, not just the top-level schema alone.

I suppose the jsonschema.validators.validates workaround makes sense in a way, but this doesn't seem to me to be a patch-level behavior change.

@Julian
Copy link
Member

Julian commented Mar 16, 2023

I can try to have another look at this issue once 4.18 is out --

The documentation for extend mentions:
Any validator callables with the same name as an existing one will (silently) replace the old validator callable entirely, effectively overriding any validation done in the “parent” validator class.

This line means "if you define a validator called properties you will override the properties validator" -- that's indeed the case, as you noted.

Redefining the semantics of what it means to be draft 4 is what changed, it's hard to say how much of that is "supported behavior" or not but it certainly was a bugfix with regard to (specified via the specification) correcting the behavior when referencing some dialect in a $ref which wasn't the one being used currently.

I don't see a scenario where a user would want to override the validators for a top-level schema, but not the validators for referenced schema.

I don't think anything like that is the case right now, it's purely the case where you write "$schema": "http://json-schema.org/draft-04/schema#", and expect that to mean your class and not the "real" Draft 4 one -- you'll note that if you run:

>>> validators.validator_for({
...     "$schema": "http://json-schema.org/draft-04/schema#",
...     # Note: The referenced schema has "$schema": "http://json-schema.org/draft-04/schema#".
...     "$ref": "https://standard.open-contracting.org/schema/1__1__5/release-schema.json"
... })
<class 'jsonschema.validators.Draft4Validator'>

you get Draft4Validator, not yours, so you're already doing the equivalent of passing some validator's schema to some other validator (e.g. passing a draft 6 schema to the draft 4 validator).

But all the above being said, as I say, it's possible there's a way to still make this work, or that an API tweak is needed.

@jpmckinney
Copy link

It sounds like the solution to this (perhaps narrow) use case is to use jsonschema.validators.validates to (temporarily) override the validator that is selected for http://json-schema.org/draft-04/schema#. If so, perhaps this could be added to the documentation (with the caveat that it's only recommended for this specific use case)?

@Julian Julian changed the title Validator gets lost during validation Behavior when attempting to *override* an official dialect is not well defined when recursing into subschemas containing the official dialect's $schema identifier Oct 30, 2023
odscjames pushed a commit to OpenDataServices/lib-cove that referenced this issue Dec 6, 2023
odscjames pushed a commit to OpenDataServices/lib-cove that referenced this issue Dec 6, 2023
odscjames pushed a commit to OpenDataServices/lib-cove that referenced this issue Dec 6, 2023
@Julian
Copy link
Member

Julian commented Feb 11, 2024

#1197 has another example of this kind of behavior:

from referencing import Registry, Resource
from referencing.jsonschema import DRAFT7
from collections.abc import Mapping
import jsonschema
import collections


class Config(collections.ChainMap):
    def __init__(self, *maps):
        super().__init__(*maps)
        
    def __getitem__(self, key):
        return self._chained_getitem(key)
        
    def _chained_getitem(self, key, **kwargs):
        """
        Actual implementation of ``__getitem__`` with chained behavior.

        When returning a new instance of `DeepChainMap`, the ``kwargs`` are
        passed to the constructor.  Subclasses should override this method
        instead of ``__getitem__`` in order to pass additional ``__init__``
        args.
        """

        chained_values = []

        for m in self.maps:
            try:
                chained_values.append(m[key])
            except KeyError:
                continue

        if not chained_values:
            raise KeyError(key)

        first = chained_values[0]

        # Note: Although instances of dict are also instances of Mapping,
        # isinstance(x, dict) is much faster than isinstance(x, Mapping), and
        # dict being the most common case we explicitly check for it first.
        if not isinstance(first, (dict, Mapping)):
            return first

        nested = []

        for m in chained_values:
            if isinstance(m, (dict, Mapping)):
                nested.append(m)
            else:
                break

        return self.__class__(*nested, **kwargs)
    
    
config = Config({'a': {'b': 'c', 'd': {}}})
    
main_schema = {
                '$id': 'main_schema', 
                'type': 'object', 
                'properties': 
                     {'a': {'properties': {'b': {}, 
                                           'd': {}}, 
                                          '$ref': 'schema_1'}
                      }
                 }
    


schema_1 = {'type': 'object', 
            'properties': {'b': {'enum': ['c']}, 
                           'd': {'$ref': 'd_schema'}},
             'required': ['b']} 


schema_2 = {'$id': 'schema_2', 
            'fixed_inputs': {
                            'type': 'object', 
                            'default': {}, 
                            'properties': {'e': {'type': 'integer', 'minimum': 1}} 
                            }
            }


d_schema = {'$schema': 'http://json-schema.org/draft-07/schema#', 
            '$id': 'd_schema', 
            '$ref': 'schema_2#/fixed_inputs'}




def retrieve(uri: str):
    if uri == 'schema_1':
        contents = schema_1
    elif uri == 'schema_2':
        contents = schema_2
    elif uri == 'd_schema':
        contents = d_schema
    return Resource.from_contents(contents, DRAFT7)


registry = Registry(retrieve=retrieve)

def is_my_object(checker, instance):
    return (
        isinstance(instance, (dict, Mapping))
    )
type_checker = jsonschema.Draft7Validator.TYPE_CHECKER.redefine(
        "object", is_my_object,
    )

CustomValidator = jsonschema.validators.extend(
    jsonschema.Draft7Validator,
    type_checker=type_checker,
)

validator = CustomValidator(schema=main_schema, registry=registry)
validator.validate(config)

@Julian Julian added Enhancement Some new desired functionality Dialects v2 Issues which will likely be addressed as part of reworked dialect support labels Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dialects v2 Issues which will likely be addressed as part of reworked dialect support Enhancement Some new desired functionality
Projects
None yet
Development

No branches or pull requests

5 participants