Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression deserializing a subclassed source #347

Closed
pchanial opened this issue Feb 4, 2022 · 5 comments · Fixed by #348
Closed

Regression deserializing a subclassed source #347

pchanial opened this issue Feb 4, 2022 · 5 comments · Fixed by #348

Comments

@pchanial
Copy link
Contributor

pchanial commented Feb 4, 2022

The following code works correctly when cythonization is disabled:

from apischema import deserialize

@dataclass
class Foo:
    bar: str

class MyDict(dict):
    pass

data = MyDict({'bar': 'baz'})
expected = Foo(bar='baz')
actual = deserialize(Foo, data)

Cythonization triggers an LSP violation: the last line fails with the error:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apischema/utils.py:400: in wrapper
    return wrapped(*args, **kwargs)
apischema/deserialization/__init__.py:896: in deserialize
    return deserialization_method(
apischema/deserialization/methods.pyx:463: in apischema.deserialization.methods.SimpleObjectMethod.deserialize
    cpdef deserialize(self, object data):
apischema/deserialization/methods.pyx:464: in apischema.deserialization.methods.SimpleObjectMethod.deserialize
    return SimpleObjectMethod_deserialize(self, data)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   data2: dict = data
E   TypeError: Expected dict, got MyDict

apischema/deserialization/methods.pyx:912: TypeError
@wyfo wyfo linked a pull request Feb 6, 2022 that will close this issue
@wyfo wyfo closed this as completed in #348 Feb 6, 2022
@wyfo
Copy link
Owner

wyfo commented Feb 6, 2022

I'd indeed use Cython cast to builtin type in order to improve performance, as builtin operations can be optimized. For example, in operator used for object deserialization can use PyDict_Contains instead of PySequence_Contains (this one calling PyDict_Contains using a function pointer).

However, following your issue, I've made a quick benchmark, and the result was surprising: adding the cast (with an additional variable) is in fact slower for small lists (but it's better as expected for bigger lists). That's why I've finally removed this cast.

@wyfo
Copy link
Owner

wyfo commented Feb 6, 2022

@pchanial By the way, could you give more details about your use case with buitin subtypes?

@pchanial
Copy link
Contributor Author

pchanial commented Feb 7, 2022

Thanks for the quick fix. We were just emulating PEP 584 in some unit tests on a Python 3.8 environment, but there could be less trivial other use cases, such as relying on defaultdict, Counter or OrderedDict.
Btw, it would be great to make public some benchmarks internal to apischema so that contributors could check performance regressions.

@wyfo
Copy link
Owner

wyfo commented Feb 7, 2022

I don't think defaultdict would be a good idea because deserialization uses in operator, but PEP 584 emulation is indeed an understandable use case.

Btw, it would be great to make public some benchmarks internal to apischema so that contributors could check performance regressions.

Benchmark is not a trivial things, and apischema's one is just here to give a rough estimation of the relative performance in comparison to alternatives. But it's really poor and not reliable if you want some precise results, like tracking performance regressions.
I don't have any other benchmark available for the moment, but I don't think it matter a lot. If a performance regression is introduced, I think it will be visible directly in the code, as apischema performance mostly comes from algorithmic optimization.

@pchanial
Copy link
Contributor Author

pchanial commented Feb 8, 2022

Out of curiosity, why does the deserializer use the in operator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants