Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dictionary type to HPy #458

Merged
merged 3 commits into from
Nov 30, 2023
Merged

Conversation

DuToitSpies
Copy link
Contributor

Adds h_DictType to HPy, as this is needed for the Cython port.

@fangerer
Copy link
Contributor

The error in tests / Check autogen does not seem to be related to the changes in that PR.
I'll take a look ...

@fangerer fangerer requested a review from mattip November 27, 2023 14:38
@mattip
Copy link
Contributor

mattip commented Nov 29, 2023

I am a bit troubled by the many shape objects. They are used to access low-level, implementation specific APIs for the different objects, right? Isn't there a way to tell cython "hey, don't do that"? The type objects are opaque on PyPy, so from my very small world this is unneeded complication.

@hodgestar
Copy link
Contributor

I am a bit troubled by the many shape objects. They are used to access low-level, implementation specific APIs for the different objects, right?

The shapes provide a means to access the internal struct of built-in objects after the PyObject header part. It mimics what happens for user defined subclasses.

It's needed for cases where an extension subclasses built-in types, I think. Someone else please correct me if needed.

@mattip has a point here though -- to what extent are these additional pieces of data really a CPython implementation detail? One could make a good argument that they are entirely an implementation detail.

The question then is whether we can provide a better porting path for existing libraries, or whether implementations just have to mimic this and possibly be slow?

@fangerer
Copy link
Contributor

fangerer commented Nov 30, 2023

Isn't there a way to tell cython "hey, don't do that"?

Yes there is but that is not unrelated to the shapes. What Du Toit needs for his Cython/HPy work is essentially HPyContext.h_DictType. Further, he just added the additional built-in shape because dict is a built-in type.

The type objects are opaque on PyPy, so from my very small world this is unneeded complication.

I can understand this. The built-in shapes are not intuitive. I would like to raise your attention to #169 (and the implementing PR #335). Since the issue and PR are a lot of reading, I try to summarize here:

  • The built-in types and their internal representation is opaque in HPy.
  • We don't technically need the built-in shapes but they are hints for the runtime. E.g. in GraalPy we also consider them such that we know what Java object to create (e.g. we have PDict which is equivalent to PyDictObject).
  • CPython works a lot with embedding structs into structs. This also happens if you create a HPy type with base type dict and have the HPyTypeSpec's basicsize set. In CPython, this would allocate an object of size sizeof(PyDictObject) + basicsize. In order to return the right pointer for CustomType_AsStruct, we need to know the base type. We could store this information at the type but this means, each time you call CustomType_AsStruct, we would have two additional indirections (read the type + read HPyType_Extra_t). So, one of our design decisions was, this should be constant. And to do so, we came up with the built-in shapes because they are, IMO (and Antonio agreed back then), the generalization of that. Every runtime is free to ignore the built-in shapes and store the additional info at the type.

One could make a good argument that they are entirely an implementation detail.

Yes, you can. It could be considered as an implementation detail. But this is similar to, e.g., HPy_GetItem_s. Why do we have it. Wouldn't it be better to just have HPy_GetItem and if the key happens to be created from a C string, then this is an impl detail, right?
I see the built-in shapes as a tradeoff between performance and usability.
Also, I think the built-in shape didn't make things worse. Please remind that we previously had the legacy flags and the shapes are just a generalization of that.

@hodgestar
Copy link
Contributor

@fangerer Maybe I misunderstood how the shapes are used then. Do we just need their size? And do we only really need them on CPython?

@fangerer
Copy link
Contributor

And do we only really need them on CPython?

We don't strictly need them at all. As I wrote, we could store the information in HPyType_Extra_t at the type. I think that can best be understood by looking at the debug mode version of ctx_AsStruct_*:

HPyType_BuiltinShape actual_shape = _HPyType_GetBuiltinShape(uctx, uh_type); \

The built-in types are just an additional information that can help interpreters.

Do we just need their size?

No, the size would not help alternative implementations. E.g. in GraalPy, we are not allocating objects by size (as CPython does), we are allocating them using a Java class. But it helps GraalPy because we could store the user data pointer in a well-known location of the appropriate Java class.

@fangerer
Copy link
Contributor

fangerer commented Nov 30, 2023

Even if this PR raised some discussions about the built-in shapes, I will merge it because this PR is just adding one (missing) shape. So, if built-in shapes are the problem, not merging this PR would solve it.
@hodgestar @mattip Any concerns merging this PR?

@hodgestar
Copy link
Contributor

@fangerer Very happy for this to be merged. Let's add the shape discussion for the agenda of the next dev call.

@fangerer fangerer merged commit a5c70cd into hpyproject:master Nov 30, 2023
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants