Skip to content

c api next level changes

Simon Cross edited this page Nov 19, 2020 · 6 revisions

What Needs to Change and Why

This document presents further details on the changes envisioned in Taking the C API to the Next Level.

Not return borrowed references

A function returns a borrowed reference to a Python object obj if instead of returning a new reference to obj, it loans the caller an existing reference.

This saves the caller from having to call Py_DECREF(obj) but at the cost of exposing the lifetime of the reference as part of the API and preventing the Python implementation from knowing when the caller has finished using the borrowed reference.

As a simple example, imagine that a Python module contained t = (1, 2, 3) and that a Python implementation wished to efficiently store that tuple as int t[] = {1, 2, 3}. PyTuple_GetItem returns a borrowed reference, so calling PyTuple_GetItem(obj_t, 0) would require creating a new reference that could never be freed even though the caller would likely only require it for a short time.

Not steal references

A function steals a reference to a Python object obj when it takes over the responsibility of freeing the reference from the caller.

This exposes the lifetime of the stolen reference as part of the API.

For example, PyList_SetItem steals the reference to the item passed to it. The caller might then continue to use the reference (even though they shouldn't) and rely on the reference continuing to be valid for as long as the list exists.

Stolen references also make it harder to write correct code. Instead of being able to easily check where references are freed by reading the C code, one must also remember the long list of API functions that steal a reference. For example, PyList_SetItem steals a reference, but PyList_Insert and PyList_Append do not.

Not expose reference counting as part of the API

The current API exposes reference counting via Py_INCREF and Py_DECREF. Implementing the semantics of this API requires maintaining a counter for each object -- i.e. emulating reference counting.

It also requires references to be long-lived -- a reference must be valid for as long as the reference count is non-zero (i.e. for the object's entire lifetime).

It would be better to use an interface that allowed the caller of the API to explicitly communicate its own requirements via obj = Py_I_Need_A_New_Reference(...) and Py_I_Am_Done_With_This_Reference(obj) API functions. These would allow for shorter lived references that can be freed as soon as an individual caller is done with them.

Not rely on stable references or pointers for object identity

In the existing API, the reference to an object is guaranteed to remain the same and to point to the same location in memory throughout the lifetime of an object. This allows one to conveniently check whether two references are references to the same object using if (ref1 == ref2) in C.

The downsides are that all the references for each object must be identical and must never change during the lifetime of the object.

Since the reference to the object is also a pointer, the objects location in memory and storage must never change.

Not expose the memory layout of Python objects as part of the API

The existing API exposes the memory layout of Python objects. For example, one can directly access PyListObject.ob_item[i] and PyObject.ob_type.

This makes it difficult to provide alternative implementations of the semantics of Python objects since the existing C memory structures need to be populated and maintained.

Not expose static types

Traditionally new types were created by statically defining a PyTypeObject in C. In addition to exposing the memory layout of these types, these static types represent shared global state (within C) and behave differently to types (i.e. classes) defined from within Python code.

Types my also be created dynamically and allocated on the heap. These new types more closely match the behaviour of types defined in Python code and are not shared global state.

Not exposing static types would create a simpler more consistent API and avoid the fixed global state.

Expose Python (the language), not a specific Python implementation version

Ideally we'd like to expose the semantics of the Python language and avoid exposing implementation details where we can.

For example, the Python code a["x"] looks up item "x" on object a. The code and high-level semantics are the same regardless of whether a is a list, or a dictionary, or a user defined class.

We'd like the C API to reflect this and provide only one set of methods for accessing items in a regardless of the type of a. So, ideally the new API would implement only Py_GetItem and not PyDict_GetItem or PyList_GetItem.

Note:

This is not intended as an excuse to rewrite the C API, only as a guide
when making design choices. Any new API should remain familiar to users
of the existing API.

Expose constructs generally useful in C

The C API should be an interface between the C language and the Python language. Its interfaces should consume and return values with common native C types such as int, char *, double.

We should avoid exposing C structs that are specific to a particular Python implementation.

For example, Py_GetItem_i(obj, i) that looks up C long on a reference to a Python object is a good API function because it provides Python language semantics via a generic C interface.

In contrast, Py_Type returns a PyTypeObject which is less good because it is an complex structure defined by a particular Python implementation version. A better Py_Type would return an ordinary reference to the object's type (i.e. a PyObject * in the existing C API and a more opaque reference in the new API).

Provide an explicit execution context

C extension code implicitly executes inside a particular interpreter. Explicitly providing this context to API functions and C extension methods will allow C code to access the correct interpreter without having to maintain static global state.

For example, if the context were passed as ctx then the constants such as None or ValueError might be retrieved with ctx->None or ctx->ValueError and the C extension could be sure it had the correct instances for the interpreter it is executing under.

Clone this wiki locally