Regression deserializing a subclassed source #347

pchanial · 2022-02-04T22:08:15Z

The following code works correctly when cythonization is disabled:

from apischema import deserialize

@dataclass
class Foo:
    bar: str

class MyDict(dict):
    pass

data = MyDict({'bar': 'baz'})
expected = Foo(bar='baz')
actual = deserialize(Foo, data)

Cythonization triggers an LSP violation: the last line fails with the error:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apischema/utils.py:400: in wrapper
    return wrapped(*args, **kwargs)
apischema/deserialization/__init__.py:896: in deserialize
    return deserialization_method(
apischema/deserialization/methods.pyx:463: in apischema.deserialization.methods.SimpleObjectMethod.deserialize
    cpdef deserialize(self, object data):
apischema/deserialization/methods.pyx:464: in apischema.deserialization.methods.SimpleObjectMethod.deserialize
    return SimpleObjectMethod_deserialize(self, data)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   data2: dict = data
E   TypeError: Expected dict, got MyDict

apischema/deserialization/methods.pyx:912: TypeError

wyfo · 2022-02-06T20:11:06Z

I'd indeed use Cython cast to builtin type in order to improve performance, as builtin operations can be optimized. For example, in operator used for object deserialization can use PyDict_Contains instead of PySequence_Contains (this one calling PyDict_Contains using a function pointer).

However, following your issue, I've made a quick benchmark, and the result was surprising: adding the cast (with an additional variable) is in fact slower for small lists (but it's better as expected for bigger lists). That's why I've finally removed this cast.

wyfo · 2022-02-06T20:15:38Z

@pchanial By the way, could you give more details about your use case with buitin subtypes?

pchanial · 2022-02-07T10:25:38Z

Thanks for the quick fix. We were just emulating PEP 584 in some unit tests on a Python 3.8 environment, but there could be less trivial other use cases, such as relying on defaultdict, Counter or OrderedDict.
Btw, it would be great to make public some benchmarks internal to apischema so that contributors could check performance regressions.

wyfo · 2022-02-07T11:35:18Z

I don't think defaultdict would be a good idea because deserialization uses in operator, but PEP 584 emulation is indeed an understandable use case.

Btw, it would be great to make public some benchmarks internal to apischema so that contributors could check performance regressions.

Benchmark is not a trivial things, and apischema's one is just here to give a rough estimation of the relative performance in comparison to alternatives. But it's really poor and not reliable if you want some precise results, like tracking performance regressions.
I don't have any other benchmark available for the moment, but I don't think it matter a lot. If a performance regression is introduced, I think it will be visible directly in the code, as apischema performance mostly comes from algorithmic optimization.

pchanial · 2022-02-08T00:33:00Z

Out of curiosity, why does the deserializer use the in operator?

wyfo linked a pull request Feb 6, 2022 that will close this issue

Remove cast to builtin types in deserialization methods #348

Merged

wyfo closed this as completed in #348 Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression deserializing a subclassed source #347

Regression deserializing a subclassed source #347

pchanial commented Feb 4, 2022 •

edited

Loading

wyfo commented Feb 6, 2022

wyfo commented Feb 6, 2022

pchanial commented Feb 7, 2022 •

edited

Loading

wyfo commented Feb 7, 2022

pchanial commented Feb 8, 2022

Regression deserializing a subclassed source #347

Regression deserializing a subclassed source #347

Comments

pchanial commented Feb 4, 2022 • edited Loading

wyfo commented Feb 6, 2022

wyfo commented Feb 6, 2022

pchanial commented Feb 7, 2022 • edited Loading

wyfo commented Feb 7, 2022

pchanial commented Feb 8, 2022

pchanial commented Feb 4, 2022 •

edited

Loading

pchanial commented Feb 7, 2022 •

edited

Loading