Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign legacy types and HPy_CAST #83

Closed
antocuni opened this issue Sep 20, 2020 · 4 comments
Closed

Redesign legacy types and HPy_CAST #83

antocuni opened this issue Sep 20, 2020 · 4 comments

Comments

@antocuni
Copy link
Collaborator

After my email to hpy-dev, I and @arigo discussed a possible solution on IRC.
Here is a summary of the solution we found:

To talk more concretely, let's start from existing Python/C code like this:

typedef struct {
    PyObject_HEAD
    int x;
    int y;
} PyPointObject;

// this is a method of Point
PyObject* Point_foo(PyObject *self, PyObject *arg)
{
    PyPointObject *p = (PyPointObject *)self;
    // ...
}

// this is an unrelated function which happens to cast a PyObject* into a
// PyPointObject*
void bar(PyObject *point)
{
    PyPointObject *p = (PyPointObject *)point;
    // ...
}

The idea is that an HPy type (as described by HPyType_Spec) can be:

  1. Pure HPy:

    • struct PyPointObject does NOT contain any header.

    • all methods and slots are written in HPy, and .legacy_slots == NULL.

    • if we obtain a PyObject * (e.g. via HPy_AsPyObject()), it is NOT possible to cast it to PyPointObject *.

  2. Legacy type:

    • struct PyPointObject MUST start with PyObject_HEAD

    • HPyType_Spec MUST contain: .legacy_headersize = offsetof(PyPointObject, x), (where x is the first field) and it is allowed to use .legacy_slots

    • It is possible to cast PyObject * to PyPointObject *

Properties of this solution:

  1. We pay the space penalty for PyObject_HEAD only if it's explicitly requested

  2. When we kill PyObject_HEAD, .legacy_headersize becomes automatically 0, and HPyType_FromSpec will immediately complain if we are still using .legacy_slots.

  3. The C-casts inside legacy methods of the type (such as Point_foo above) are automatically valid, since if we kill PyObject_HEAD we can't have legacy methods (as explained above)

The remaining problem of this solution is how to "cast" an HPy handle into a PyPointObject* efficiently: Currently, we have _HPy_Cast(), but it is impossible to implement it efficiently, because we need to know whether the
type is pure or legacy, and we don't want to pay the cost of looking up the type at runtime. So, we kill _HPy_Cast and introduce two new API functions: HPy_CastPure and HPy_CastLegacy. Both of them can be implemented efficiently at runtime, and as an additionaly bonuns we can actively check that we are not calling them on the wrong type in debug mode.

In order to simplify daily usage, we will introduce/document the convention that the author of PyPointObject * will also define a macro:

#define PyPointObject_CAST(ctx, h) ((PyPointObject*)HPy_CastLegacy(ctx, h))

This way, the rest of the code can simply call PyPointObject_CAST without having to worry whether it's a pure or legacy type. Then, when the porting is completed we can kill PyObject_HEAD and simply adapt the macro to use HPy_CastPure. If we forget and/or use the wrong call to HPy_Cast*, the debug mode will catch the error very soon.

There is a remaining proble, though. The C-casts in unrelated methods (such as bar above) are problematic: there is no nice/automatic way to detect/avoid/emit a warning in case of a cast from PyObject * to PyPointObject *. Once we kill PyObject_HEAD, all those casts become invalid and will result in subtle bugs/segfaults. One easy way to mitigate the problem is to rename struct PyPointObject into e.g. struct HPyPointObject, and typedef HPyPointObject PyPointObject: the new hpy-friendly code will use HPyPointObject, while the old code can continue to use PyPointObject: when we finally kill PyObject_HEAD we also need to kill the typedef: this way, if by chance there is any code around which still casts to PyPointObject, we will get a nice compilation error.

So to summarize, the hpy-ification of the original code will start like this:

// step 1 of the porting: start to hpy-ify the code
typedef struct {
    PyObject_HEAD // will be killed
    int x;
    int y;
} HPyPointObject;
typedef HPyPointObject PyPointObject; // will be killed

#define HPyPointObject_CAST(ctx, h) ((HPyPointObject*)HPy_CastLegacy(ctx, h))

// this is a LEGACY method of Point
PyObject* Point_foo(PyObject *self, PyObject *arg)
{
    PyPointObject *p = (PyPointObject *)self; // still works
    // ...
}

void bar(PyObject *point)
{
    PyPointObject *p = (PyPointObject *)point; // still works
    // ...
}


HPyType_Spec PointType_Spec = {
    .name = "Point",
    .legacy_headersize = offsetof(HPyPointObject, x),
    .legacy_slots = { ... }
}

Then, we can hpy-ify some of the code:

// step 2
typedef struct {
    PyObject_HEAD // still a legacy type for now
    int x;
    int y;
} HPyPointObject;

...

HPyDef_METH(...)
HPy Point_foo(HPyContext ctx, HPy self, HPy arg)
{
    HPyPointObject *p = HPyPointObject_CAST(ctx, self);
    // ...
}
...

Finally, we can turn the type into a pure-hpy type:

typedef struct {
    //PyObject_HEAD // KILLED!
    int x;
    int y;
} HPyPointObject;
// typedef HPyPointObject PyPointObject; // KILLED!

// note: this is calling HPy_CastPure now
#define HPyPointObject_CAST(ctx, h) ((HPyPointObject*)HPy_CastPure(ctx, h))

...

HPyType_Spec PointType_Spec = {
    .name = "Point",
    // .legacy_headersize = offsetof(HPyPointObject, x), // KILLED!
    // .legacy_slots = { ... } // KILLED!
    ...
}

At this point, bar will no longer compile because struct PyPointObject* no longer exists. Assuming it's still part of legacy code which manages PyObject *, it will need to be manually adapted in this way:

void bar(PyObject *point)
{
    HPy h_point = HPy_FromPyObject(ctx, point);
    HPyPointObject *p = HPyPointObject_CAST(ctx, h_point);
    // ...
}

Open question: should we find a better name for HPy_Cast{Pure,Legacy}? They don't really do a "cast" (the actual cast is done by the macro), but they are returning a pointer to the C implementation. Maybe something like
HPy_GetStructImplPtr is more appropriate, but I can't find any good name, so suggestions are welcome.

@antocuni
Copy link
Collaborator Author

antocuni commented Sep 24, 2020

An idea which just came to my mind: instead of telling the user to define a macro like HPyPointObject_CAST, we could suggest to use a static inline function instead, which is just better and more type safe:

static inline HPyPointObject *HPyPointObject_Cast(HPyContext ctx, HPy h) {
    return (HPyPointObject*)HPy_CastPure(ctx, h);
}

And, while we are at it, we could even provide a macro to generate it:

HPyCast_DEFINE_PURE(HPyPointObject_Cast, HPyPointObject);

@hodgestar
Copy link
Contributor

Regarding naming: HPy_Cast and HPy_CastLegacy seem slightly better to me than having Pure at the end of the headless version.

@vstinner
Copy link
Contributor

FYI moving PyObject members into a separated "hidden" structure was discussed at: https://bugs.python.org/issue39573#msg366473

The idea would be to make PyObject an empty structure and allocate a new hidden "PyObjectHead" structure before a PyObject* pointer. It is already done for PyGC_Head header.

@hodgestar
Copy link
Contributor

Implemented in #156.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants