-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grand Unified Python Object Layout #553
Comments
Sounds promising! It'd be a big chunk of work though. There might be some overlap between “legacy mystery layout” and “Primary”. Perhaps it would work to allow multiple Primaries in the MRO, with single inheritance only, if the subclass has “intimate knowledge” of the base's primary struct? The proposal looks essentially compatible with PEP 697:
Could we have a |
I read that as:
Considering a recent Discourse topic, we might want:
|
Of course, everything we do needs to support mystery legacy [insert feature here]. It's just one bit of information, so shouldn't be a problem.
Maybe, although I'm inclined to just treat "legacy mystery layout” as a black box to start with.
Sure, we can make |
I think there is a difference when subclassing a class of variable sized objects, |
It is functionally equivalent, yes.
No, because we can have a primary class and a variable sized class in an MRO, which means that variable sized data will not be in the primary position. |
Regarding the size of variable sized structs: However, we need to allow classes to compute |
PEP 697 primarily specifies an API. It also, necessarily, describes how the API will work with (a subset of) the current class layouts.
Maybe you can do that, but you can't ignore it entirely.
So, what happens to the [edit: you just answered the following]
|
What exactly does |
Hello, I became aware of this discussion through https://discuss.python.org/t/24136 and would like to chime in on one aspect. Please excuse me if I have overseen other important aspects. @markshannon wrote:
Please note that having a length in the sense of The best example is the built-in For other examples, consider a type representing an (immutable) tree, or a small multidimensional array: both can be usefully represented as variable size That's why in a redesign of python object layout it may be worthwhile to consider getting rid of any equivalent of today's |
This effectively merges What's missing is a |
Could you remind us of the use case for this? |
Telling multiple similar getters/setters apart. |
See also: making |
Could the "legacy mystery layout" of
I find the design of |
This is likely to need to end up as PEP, but I want to sketch out the idea here first.
General idea
All Python objects should share a common layout that makes the layout visible to the VM, and provides the performance needed by C extensions. In order to ensure that it is fast enough we should use it for internal classes, like tuples, lists and dicts. If it is fast enough for those classes, it will be good enough for NumPy and Cython. Persuading them to use a new API will be another issue, but we need to provide the capability first.
Requirements
Performance
Important classes ("important" must be user defined, or it won't get used) should be able to get at least the speed of manually defined C code. Because the VM will have visibility into the classes, we should be able to get better performance in some cases.
Composability
Most classes should be freely composable, including through multiple inheritance.
Flexibility
The design shouldn't put unnecessary constraints on either the VM implementation or the C extension code.
Conceptual interface
One layout per class
The layout of a Python object depends on its class. All instances of a class have the same layout, from the point of view of C extensions (the VM may use "object shapes" or some other optimization resulting in differing physical layout in a way that is not visible to C extensions.
C extensions define a C struct and requirements, not layout
When defining a C extension type, the C struct will be described to the VM in terms of size and alignment, and the details of accessible fields in it, much like PEP 697, but without the control over layout that PEP 697 gives. Layout is up to the VM.
Layouts aren't inherited
If class C inherits from class B, its layout is not inherited. In other words if class
B
defines a structS
, the offset ofS
within an objectC()
may differ from the offset ofS
within an objectB()
.Extension class properties
Each extension defined class can have the following properties, which are not inherited.
list
,tuple
tuple
(but notdict
orlist
). ImpliesHasLength
No class can have more than one class in its MRO (including itself) which each of the above property.
The C-API.
To get the offset of a struct:
intptr_t PyObject_GetStructOffset(PyTypeObject *self_class, PyTypeObject *declaring_class);
will return the offset, in bytes, from the owning
PyObject *
to the requires struct.The offset may be negative, a return value of -1 indicates an error.
PyObject_GetStructOffset
is a pure function of the triple(sys.implementation, self_class, declaring_class)
meaning that the offset may be cached globally during execution but should not be cached in source code or to disk.The offset of the start of primary structs is an unstable API constant
const intptr_t PyUnstable_PRIMARY_STRUCT_OFFSET
is a compile-time constant (which I expected to always be positive, but lets leave that open for now)This allows "primary" classes to get performance on a par with the current implementation of, for example,
PyTupleObject
.Declaring fields.
If we want to expose a C field to the VM, we need to declare its type, and offset.
In addition we want to declare various attributes, much as are already declared in
PyMemberDef
One additional capability we would like to add, that
PyMemberDef
doesn't have, is to have fields that can be directly read, but have a function setter, which suggests the following struct:If
get
is NULL, it implies the field can be read directly, unless theWRITE_ONLY
flag is set.If
set
is NULL, it implies the field can be written directly, unless theREAD_ONLY
flag is set.A possible implementation.
Objects are laid as follows:
__dict__
/values,__weakrefs__
, etc.)The object pointer (PyObject *) points into the middle of the the core object, as it has done since the cycle GC was introduced.
The offset from the object pointer to the end of the core object will be a multiple of the maximum C alignment and will be
PyUnstable_PRIMARY_STRUCT_OFFSET
.Example.
We stated above that if the "Grand Unified Python Object Layout" could not support classes like
tuple
ordict
, it would not be reasonable to expect C extensions to use it.We would define tuple with the following struct:
And declare that
tuple
is a primary, variable-sized type.We can then define the accessor functions efficiently as:
The length property could be defined as follows (we wouldn't do this in practice as
().length
isn't a thing)The text was updated successfully, but these errors were encountered: