Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Python packages using the limited API #42

Open
vyasr opened this issue Apr 18, 2024 · 25 comments
Open

Build Python packages using the limited API #42

vyasr opened this issue Apr 18, 2024 · 25 comments

Comments

@vyasr
Copy link
Contributor

vyasr commented Apr 18, 2024

Python has a limited API that is guaranteed to be stable across minor releases. Any code using the Python C API that limits itself to using code in the limited API is guaranteed to also compile on future minor versions of Python within the same major family. More importantly, all symbols in the current (and some historical) version of the limited API are part of Python's stable ABI, which also does not change between Python minor versions and allows extensions compiled against one Python version to continue working on future versions of Python.

Currently RAPIDS builds a single wheel per Python version. If we were to compile using the Python stable ABI, we would be able to instead build a single wheel that works for all Python versions that we support. There would be a number of benefits here:

  • Reduced build time: This benefit is largely reduced by Support dynamic linking between RAPIDS wheels #33, since if we build the C++ components as standalone wheels they are already Python-independent (except when we actually use the Python C API in our own C libraries; the only example that I'm currently aware of in RAPIDS is ucxx). The Python components alone are generally small and easy to build. We'll still benefit, but the benefits will be much smaller.
  • Reduced testing time: Currently we run test across a number of Python versions for our packages on every PR. We often struggle with what versions need to be tested each time. If we were to only build a single wheel that runs on all Python versions, it would be much easier to justify a consistent strategy of always testing e.g. the earliest and latest Python versions. We may still want to test more broadly in nightlies, but really the only failure mode here is if a patch release is made for a Python version that is neither the earliest nor the latest, and that patch release contains breaking changes. That is certainly possible (e.g. the recent dask failure that forced us to make a last-minute patch), but it's infrequent enough that we don't need to be testing regularly.
  • Wider support matrix: Since we'll have a single binary that works for all Python versions, maintaining the full support matrix will be a lot easier and we won't feel as much pressure to drop earlier versions in order to support newer ones.
  • Day 0 support: Our wheels will work for new Python versions as soon as they're released. Of course, if there are breaking changes then we'll have to address those, but in the average case where things do work users won't be stuck waiting on us.
  • Better installation experience: Having a wheel that automatically works across Python versions will reduce the frequency of issues that are raised around our pip installs.

Here are the tasks (some ours, some external) that need to be accomplished to make this possible:

  • Making Cython compatible with the limited API: Cython has preliminary support for the limited API. However, this support is still experimental, and most code still won't compile. I have been making improvements to Cython itself to fix this, and I now have a local development branch of Cython where I can compile most of RAPIDS (with additional changes to RAPIDS libraries). We won't be able to move forward with releasing production abi3 wheels until this support in Cython is released. This is going to be the biggest bottleneck for us.
  • nanobind support for the limited API: nanobind can already produce abi3 wheels when compiled with Python 3.12 or later. Right now we use nanobind in pylibcugraphops, and nowhere else.
  • Removing C API usage in our code: RAPIDS makes very minimal direct usage of the Python C API. The predominant use case that I see is creating memoryviews in order to access some buffers directly. We can fix this by constructing buffers directly. The other thing we'll want to do is remove usage of the NumPy C API, which has no promise of supporting the limited API AFAIK. That will be addressed in Remove usage of the NumPy C API #41. Other use cases can be addressed incrementally.
  • Intermediate vs. long-term: If Cython support for the limited API ends up being released before RAPIDS drops support for Python 3.10, we may be in an intermediate state where we still need to build a version-specific wheel for 3.10 while building an abi3 wheel for 3.11+ (and 3.12+ for pylibcugraphops due to nanobind). If that is the case, it shouldn't cause much difficulty since it'll just involve adding a tiny bit of logic on top of our existing GH workflows.

At this stage, it is not yet clear whether the tradeoffs required will be worthwhile, or at what point the ecosystem's support for the limited API will be reliable enough for us to use in production. However, it shouldn't be too much work to get us to the point of at least being able to experiment with limited API builds, so we can start answering questions around performance and complexity fairly soon. I expect that we can pretty easily remove explicit reliance on any APIs that are not part of the stable ABI, at which point this really becomes a question of the level of support our binding tools provide and if/when we're comfortable with those.

@jakirkham
Copy link
Member

It is worth noting that the Python Buffer Protocol C API landed in Python 3.11 (additional ref). So think that is a minimum for us

Also find this listing of functions in the Limited and Stable API quite helpful

@vyasr
Copy link
Contributor Author

vyasr commented Apr 18, 2024

Yes. I have been able to build most of RAPIDS using Cython's limited API supported (along with some additional changes I have locally) in Python 3.11. Python 3.11 is definitely a must. But as I said above in the "intermediate vs long-term" bullet, we could still benefit before dropping Python<3.11 support by building one wheel for each older Python version and then build an abi3 wheel to be used for Python 3.11+.

@vyasr
Copy link
Contributor Author

vyasr commented Apr 26, 2024

I've made PRs ro rmm, raft, and cuml that address the issues in those repos. I've also taken steps to remove ucxx's usage of the the numpy C API (#41), which in turn removes one of its primary incompatibilities. The last major issue in RAPIDS code that I see is the usage of the array module in the Array class that is vendored by both kvikio and ucxx (and ucx-py). If that can be removed, then I think we'll be in good shape on the RAPIDS end, and we'll just be waiting on support for this feature in Cython itself. @jakirkham expressed interest in helping out with that in the process of making that Array class more broadly usable.

@da-woods
Copy link

A small warning here:

There's definitely places where Cython is substituting private C API for private Python API, so future compatibility definitely isn't guaranteed (it'll just be a runtime failure rather than a compile-time failure). We'll see how that evolves - I hope to be able to make some of these warnings rather than failures (since it's largely just non-essential introspection support).

We're also having to build a few more runtime version-checks into our code. Which is obviously a little risky because although you're compiling the same thing, you're taking different paths on different Python versions.

So the upshot is that your testing matrix probably doesn't reduce to a single version. (From Cython's point of view the testing matrix probably expands, because we really should be testing combinations like Py_LIMITED_API=0x03090000 with Python 3.12 and that gets big quite quickly so I don't know how we're going to do that)

@vyasr
Copy link
Contributor Author

vyasr commented Apr 30, 2024

Thanks for chiming in here @da-woods! I appreciate your comments. I agree that there is more complexity around testing here than simply a set and forget single version. At present, RAPIDS typically supports 2 or 3 Python versions at a time. We tend to lag a bit behind NEP 29/SPEC 0 timelines, so we support older versions a bit longer at the expense of not supporting new ones until they've been out for a bit. A significant part of the resource constraint equation for us is certainly on the testing side since running our full test suites on multiple Python versions adds up quickly. The way that I had envisioned this working, if we did move forward, would be that we built on the oldest supported Python (e.g. Py_LIMITED_API=0x03090000) and then we ran tests on the earliest and latest Python we supported (e.g. 3.9 and 3.11). The big benefit of using the limited API in this space would be that we could bump up the latest supported Python version without needing to move the earliest. The assumption would be that by the time a new Python version was released (e.g. 3.12), we would have gone through enough patch release of the previous release (3.11) to trust that nothing would be breaking in future patch releases. Of course, in practice that's probably not true: CPython certainly doesn't always strictly follow SemVer rules for patch releases, and to be fair Hyrum's law certainly applies to a project at that scale. Beyond that Cython's use of CPython internals certainly means that we could be broken even by patch releases. In practice what this would probably mean is that we would run tests as mentioned above on a frequent basis (on every PR), then run a larger test matrix infrequently (say, nightly or weekly). IOW even with limited API builds we would definitely still want to do broader testing to ensure that such builds are actually as compatible as they claim to be. However, I'd hope that the scale of that testing would be reduced.

rapids-bot bot pushed a commit to rapidsai/cuml that referenced this issue Apr 30, 2024
This PR removes usage of the only method in raft's Cython that is not part of the Python limited API. Contributes to rapidsai/build-planning#42

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5871
rapids-bot bot pushed a commit to rapidsai/rmm that referenced this issue Apr 30, 2024
This PR removes usage of the only method in rmm's Cython that is not part of the Python limited API. Contributes to rapidsai/build-planning#42

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - https://github.com/jakirkham

Approvers:
  - https://github.com/jakirkham

URL: #1545
rapids-bot bot pushed a commit to rapidsai/raft that referenced this issue May 7, 2024
This PR removes usage of the only method in raft's Cython that is not part of the Python limited API. Contributes to rapidsai/build-planning#42

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #2282
abc99lr pushed a commit to abc99lr/raft that referenced this issue May 10, 2024
This PR removes usage of the only method in raft's Cython that is not part of the Python limited API. Contributes to rapidsai/build-planning#42

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#2282
@vyasr
Copy link
Contributor Author

vyasr commented Sep 24, 2024

With the latest versions of branch-24.10, which contain a number of changes I made over the past few months for limited API compatibility along with the removal of pyarrow and numpy as build requirements in cudf and cuspatial, most of RAPIDS now builds with the limited API flag on. I have run some smoke tests and things generally work OK, but I haven't done anything extensive. ucxx and kvikio remain outstanding since we need to rewrite the Array class to not use the Python array's C API since that does not support the limited API. The latest tests can be seen in rapidsai/devcontainers#278.

@da-woods
Copy link

I don't know if it's any help, but the quickest non-array.array API way I've found to allocate memory is:

cdef mview_cast = memoryview.cast
cdef Py_ssize_t[::1] new_Py_ssize_t_array(Py_ssize_t n):
    return mview_cast(PyMemoryView_FromObject(PyByteArray_FromStringAndSize(NULL, n* sizeof(Py_ssize_t))), "q")

It isn't as good, but it's surprisingly close given how much it actually does. "Only" 70% slower.

You probably have to replace

mv = PyMemoryView_FromObject(obj)
pybuf = PyMemoryView_GET_BUFFER(mv)

with PyObject_GetBuffer and an appropriate PyBuffer_Release in the destructor. But you probably should be doing that anyway - you're currently keeping a pointer to the data while not retaining a buffer-reference. That means things like bytearray could potentially be resized from under you.

@vyasr
Copy link
Contributor Author

vyasr commented Sep 27, 2024

Thanks for the tip David! That could be helpful, but I'll have to look at the Array class more closely to be sure. I suspect that there are larger refactorings of our code base that could be done to make this unnecessary.

rapids-bot bot pushed a commit to rapidsai/ucx-py that referenced this issue Oct 22, 2024
In `Array`, `Py_ssize_t[::1]` objects are currently backed by [CPython `array`'s]( https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cpython-array-module ) with some internal bits expressed in Cython. However these are not compatible with [Python's Limited API and Stable ABI]( https://docs.python.org/3/c-api/stable.html#c-api-stability ). To address that, switch to [Cython's own `array` type]( https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cython-arrays ). As this is baked into Cython and doesn't use anything special, it is compatible with Python's Limited API and Stable ABI.

xref: rapidsai/build-planning#42

Authors:
  - https://github.com/jakirkham

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1087
rapids-bot bot pushed a commit to rapidsai/ucxx that referenced this issue Oct 22, 2024
In `Array`, `Py_ssize_t[::1]` objects are currently backed by [CPython `array`'s]( https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cpython-array-module ) with some internal bits expressed in Cython. However these are not compatible with [Python's Limited API and Stable ABI]( https://docs.python.org/3/c-api/stable.html#c-api-stability ). To address that, switch to [Cython's own `array` type]( https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cython-arrays ). As this is baked into Cython and doesn't use anything special, it is compatible with Python's Limited API and Stable ABI.

xref: rapidsai/build-planning#42

Authors:
  - https://github.com/jakirkham

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #307
rapids-bot bot pushed a commit to rapidsai/kvikio that referenced this issue Oct 22, 2024
In `Array`, `Py_ssize_t[::1]` objects are currently backed by [CPython `array`'s]( https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cpython-array-module ) with some internal bits expressed in Cython. However these are not compatible with [Python's Limited API and Stable ABI]( https://docs.python.org/3/c-api/stable.html#c-api-stability ). To address that, switch to [Cython's own `array` type]( https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cython-arrays ). As this is baked into Cython and doesn't use anything special, it is compatible with Python's Limited API and Stable ABI.

xref: rapidsai/build-planning#42

Authors:
  - https://github.com/jakirkham

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #504
@jakirkham
Copy link
Member

Had played with a few different approaches to rewrite new_Py_ssize_t_array including David's PyByteArray_FromStringAndSize suggestion. In the end found that simply using Cython's own array type worked reasonably well without compromising too much on performance or clarity. So we have rewritten new_Py_ssize_t_array accordingly

That should address this piece of this issue. Though am sure there are other things we may still need to do

@vyasr
Copy link
Contributor Author

vyasr commented Oct 28, 2024

Wow that's awesome. Thanks for trying that out John! I'll refresh rapidsai/devcontainers#278 for ucxx and kvikio and see what breaks next.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 28, 2024

Looks like ucxx is using a few functions that aren't part of the limited API (PyUnicode_1BYTE_DATA, PyObject_CallMethodObjArgs). I'll look into updating those the next time I have a bit of free time.

@jameslamb
Copy link
Member

Looks like ucxx is using a few functions that aren't part of the limited API (PyUnicode_1BYTE_DATA, PyObject_CallMethodObjArgs)

Linking some background on that that might be helpful: rapidsai/ucxx#276 (comment)

@jakirkham
Copy link
Member

Looking at the CPython API docs for PyObject_CallMethodObjArgs, it says it is part of the Stable ABI

That said the PR James linked to uses PyObject_CallMethodOneArg, which appears to be absent from the Stable ABI. Looking at the implementation of PyObject_CallMethodOneArg this is mainly a convenience wrapper around PyObject_VectorcallMethod, which is part of the Stable ABI. So it should be possible to rewrite using the Stable ABI

The PyUnicode_1BYTE_DATA appears to be more complicated as much of Python's C API for Unicode Objects is not part of the Stable ABI. It might be worth talking to Peter to better understand what we would like to accomplish in that code. For example if we just needs some C strings to create that error message, we might be better off setting them globally and using them directly in that error construction call

@jakirkham
Copy link
Member

It is worth noting that Python 3.13's free-threading build (GIL disabled) is not compatible with the Limited C API or Stable ABI

Of course the non-free-threading (GIL enabled) Python 3.13 can use the Limited C API and Stable ABI

Ref: https://docs.python.org/3/howto/free-threading-extensions.html#building-extensions-for-the-free-threaded-build

@vyasr
Copy link
Contributor Author

vyasr commented Oct 28, 2024

Yeah I'm not thinking too much about the free-threading builds yet. Given the phrasing:

The free-threaded build does not currently support the Limited C API or the stable ABI.

(emphasis mine). It seems to me like they can't make free threaded builds the default without something akin to a major version bump unless they get the limited/stable ABI working, because without that all of a sudden it would be valid to install a bunch of incompatible packages into an environment. Either that, or they try and get installers pip/uv/etc to handle this, but even then you'll see tons of errors from people with older versions of those installers in environments that don't get updated. For the moment I'm OK waiting to see what direction this goes, but I'd personally be pretty surprised if they decided to break the limited API altogether to get free threading in as default.

@jakirkham
Copy link
Member

The reason I mention it is most of the value of the Stable ABI and Limited API is building a package for one Python version and reusing that package for multiple versions. However with this change in Python 3.13 and the fact that the Python Buffer Protocol was only added in Python 3.11, it means we can only confidently build something for Python 3.11 and allow it to be installed for Python 3.12.

Before Python 3.11 we lack the Python Buffer Protocol, so have to build per Python version there.

Starting with Python 3.13, we lack an ability to constrain a package to only non-free-threading builds. IOW we have to assume a package using the Stable ABI and Limited API could wind up being installed on a free-threading build of Python unexpectedly. So we would need to build for Python 3.13 separately (and if we want to build for free-threading we would need to add that as an additional build).

IOW we are stuck with an island of compatibility at the moment with Python 3.11 and 3.12 on the Stable ABI and Limited API.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 29, 2024

Starting with Python 3.13, we lack an ability to constrain a package to only non-free-threading builds. IOW we have to assume a package using the Stable ABI and Limited API could wind up being installed on a free-threading build of Python unexpectedly.

This can't be true... The page you linked above says:

C API extensions need to be built specifically for the free-threaded build. The wheels, shared libraries, and binaries are indicated by a t suffix.

Installers have to respect this. If they don't, free-threaded builds would start breaking all over the places when incompatible extensions are installed. My assumption is that for as long as free-threaded builds are not the default build, if we want to support them we will have to build separate wheels for them, and if that is the case we could use limited API builds for the default builds and then separate builds for each Python version under free threading.

I don't think anyone knows what will happen at the point when free-threading becomes the default yet, though.

@jakirkham
Copy link
Member

What I'm trying to say though is there is not a way to separate them AFAIK. Though if you would like to outline a proposal, would be interested to read it

@vyasr
Copy link
Contributor Author

vyasr commented Oct 31, 2024

I don't understand what you mean by "there is not a way to separate them".

The wheels, shared libraries, and binaries are indicated by a t suffix.

The package files are delineated, so as long as you appropriately tag wheels that are built with the limited API and installers are made aware of this too, what more do we need?

@jakirkham
Copy link
Member

Could you please outline a proposal of how you see this being used?

How will we build for/support/package RAPIDS for each Python version in the range 3.10-3.13 (including free-threading)?

Think that will make it easier to identify and discuss edge cases

@vyasr
Copy link
Contributor Author

vyasr commented Oct 31, 2024

Let's use 3.11-3.14 so that we have two free-threading releases. Let's just pick rmm to simplify the discussion. I would build one abi3 wheel for the non free-threaded builds that works on all Python>=3.11, and then I would build two free-threaded wheels, one for 3.13 and 3.14. When 3.15 is released, I would add a new 3.15 free-threaded build, with no change to the not free-threaded build.

Is that what you were looking for in a proposal?

Of course, there are plenty of open questions to address:

We have to assume the free-threaded build will be the only build in the reasonable future

  • I haven't tracked discussions on what the plan is for the limited API in the final version of the free threaded build of Python

@jakirkham
Copy link
Member

jakirkham commented Oct 31, 2024

Thanks Vyas! 🙏

That's a good start

Now for the package using the limited API/stable ABI, how will the pack metadata capture its intended compatibility? IOW what would go in pyproject.toml?

@vyasr
Copy link
Contributor Author

vyasr commented Oct 31, 2024

I don't think anything changes in pyproject.toml. It's still the same package. When you build the wheel you have to specify the appropriate flags and it changes the output filename. I don't know which build backends are currently supporting the wheel name appropriately (since we have to add the t suffix), so we may have to tag wheels separately. For conda packages I think we would use a separate label until conda offers something different (labels are what Stan's post here implicitly suggests).

@jameslamb
Copy link
Member

you have to specify the appropriate flags and it changes the output filename

Just want to mention PyPA has a tool called abi3audit (https://github.com/pypa/abi3audit). It'd be helpful in this effort... it doesn't have a repair command similar to auditwheel, but it could at least be used to raise errors in CI if we try to tag a wheel as following the stable API which actually isn't.

The "Motivation" section of the docs explains this well: https://github.com/pypa/abi3audit?tab=readme-ov-file#motivation

Context for creation of that tool: pypa/auditwheel#395

@vyasr
Copy link
Contributor Author

vyasr commented Oct 31, 2024

You can see an example of using abi3audit in this pynvjitlink PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants