Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Basic module state support #328

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

steve-s
Copy link
Contributor

@steve-s steve-s commented Jun 3, 2022

Adds support for specifying and using module state including the traverse and destroy slots. Module state can contain any plain C data and HPyFields.

There is one thing missing: ability to query the module state from a class. In CPython this plays along with calling conventions that pass along defining class. HPy does not provide those calling conventions, but we can add them.

Introducing module state API similar to CPython will ease the migration to HPy for existing extensions that use CPython module state, and it adds one missing feature to complement the HPyGlobal: a place for plain C per-interpreter state, e.g., some int counter cannot be stored in HPyGlobal (unless wrapped in Python object).

Before we introduce module state, we may consider some alternatives:

We can also support module state and HPy specific alternative. If we merge this PR that would be the situation: we'd have more generic module state and specialized HPy specific concept of HPyGlobal.

Module context idea:

Like HPyContext it would be like an "argument", i.e., it would contain HPy handles directly and the user would have to HPy_Dup them if needed. They would be closed implicitly at the end of the "downcall".

This would allow to avoid the issue with HPyGlobal that to use a global, one has to do:

HPy type = HPyGlobal_Load(ctx, MyType);
HPy obj = HPy_New(ctx, type);
HPy_Close(ctx, type);

which is extra 2 calls. With something like a context it could be just HPy_New(ctx, moduleCtx->myType), like it would be with some constant from HPyContext.

Alongside a calling convention that passes the module context as an argument, there would have to be a function to fetch the module context in places where it is not (easily) available and then probably some "close module context" function.

What is not clear is how the module context would be specified. Since it does not contain HPyFields, user cannot just assign HPy handles to it. Logically the module context would be created by the Python engine as an argument for each call (implementation may cache it). It may be specified by using offsetof. Complete example:

struct {
    HPy my_type1;
    int my_counter;
} MyModuleContext;

// Note: something like argument clinic could provide nicer way to handle this
HPyDef_METH(myabs, "myabs", myabs_impl, HPyFunc_ModCtx_O)
foo(HPyContext *ctx, MyModuleContext *moduleCtx, HPy self, HPy arg) {
    return HPy_New(ctx, moduleCtx->my_type1);
}

ModuleDef module = {
  //...
  context_size = sizeof(MyModuleContext),
};

static HPy init_my_module_impl(HPyContext *ctx) {
    HPy type_handle = HPyType_FromSpec(ctx, &module);
    // ...
    HPyModuleContext_Add(ctx, offsetof(MyModuleContext, my_type1), type_handle);
    HPy_Close(ctx, type_handle);
    // ...
}

the implementation would have to remember all offsets and Python objects whose handles are expected to be found at those offsets. Then it would allocate memory of the right size and set the right HPy handles to the right offsets.

Adds support for specifying and using module state including the
traverse and destroy slots. What is missing is the ability to query
module state from a defining class. For that to be useful, HPy would
have to add support for calling convention that passes along the
defining class.

Module state: add support for destroy slot

Module state: more tests and cleanup

Module state: embed user data instead of using indirection
@steve-s
Copy link
Contributor Author

steve-s commented Aug 16, 2022

One more thing to consider regarding NumPy port including its API. NumPy API exposes several global PyObject* variables, such as PyArray_Type. With HPyGlobal the conversion is straightforward (modulo the issue with compatible HPyContext struct layouts in the calling extensions and NumPy), with module state it does not seem so? AFAIK there is no way in CPython to get module state "out of thin air" and especially module state of another module.

@mattip
Copy link
Contributor

mattip commented Dec 1, 2022

I think this API is used by recordclass

@steve-s
Copy link
Contributor Author

steve-s commented Dec 1, 2022

Update:

  • in the dev call on 1.12.2022 we decided to go on with merging this as-is, I will just polish the code and fix the CI failures
  • therefore: last call for reviews of the code/the idea as a whole
  • what is left for followup work: we will need to add a new calling convention for the slots, such that users can access module state there (module is not passed to slots, see CPython docs) -- we may add the same calling convention as CPython added to support this, or we may add some calling convention tailored for module state only.

@steve-s
Copy link
Contributor Author

steve-s commented Dec 5, 2022

I would like to take back the call to just polish and merge this PR. I've read more on module state in CPython and I think we should figure out the whole thing first, especially how to access the module state in various other contexts (CPython docs on the topic):

  • methods attached to a type (in CPython a new calling convention, can work for us fine)
  • slots (in CPython new API that takes PyModuleDef, if we take that route, we would have to map HPyModuleDef to PyModuleDef -- doable, but bigger change worth upfront design)

On top of that: in the light of multi-phase module init, which allows to create multiple modules from one extension (in single interpreter), it seems to me that CPython's way of accessing module state from slots may have a fundamental flaw. Since module state is identified by PyModuleDef and that may be shared among multiple module instances.

With this understanding I am now also convinced that module state is necessary feature to complement or replace HPyGlobal, because HPyGlobal is fundamentally per extension and there's no simple way to make it per module instance. At the same time, I think there may be a value to HPyGlobal, because it provides more efficient and significantly simpler alternative for module state (no need to define traverse, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants