Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Make numpy.h compatible with both NumPy 1.x and 2.x #5050

Merged
merged 25 commits into from
Mar 26, 2024

Conversation

seberg
Copy link
Contributor

@seberg seberg commented Mar 6, 2024

This makes changes necessary with numpy/numpy#25943 and finalizes other NumPy 2 changes (the GetParams function has been deprecated/disabled for a while).

I have tested this with SciPy and SciPy is happy. Some (TBH, I don't really expect it much) users may be unhappy if they actually try accessing the structured dtype info.
My feeling on that last thing would be that if it is a problem it can be fixed up in a follow up. Users who can't migrate to NumPy 2 immediately could pin to an older version of pybind11, I presume.

I am very happy if someone takes over here (not sure what else is needed). But wanted to do the main changes to see that it isn't terribly hard.

Suggested changelog entry:

* ``pybind11`` now supports compiling for NumPy 2.  The required changes are small and
  should not matter to most projects. However, if you experience issues you can use   ``#define PYBIND11_NUMPY_1_ONLY`` to disable the new support for now.

Maybe additional information if you would like:

The two relevant changes are that:
* ``dtype.flags()`` is now a ``uint64`` and ``dtype.alignment()`` an ``ssize_t`` (and NumPy may return an larger than integer value for ``itemsize()`` in NumPy 2.x). 
* The long deprecated NumPy function ``PyArray_GetArrayParamsFromObject`` function is not available anymore.

Due to NumPy changes, you may experience difficulties updating to NumPy 2.
Please see the [NumPy 2 migration guide](https://numpy.org/devdocs/numpy_2_0_migration_guide.html) for details.
For example, a more direct change could be that the default integer `"int_"`
(and `"uint"`) is now ``ssize_t`` and not ``long`` (affects 64bit windows). 

@rwgk
Copy link
Collaborator

rwgk commented Mar 6, 2024

Thanks @seberg for getting the ball rolling.

Users who can't migrate to NumPy 2 immediately could pin to an older version of pybind11, I presume.

My assessment/opinion:

Having been in situations like that, I think that's pretty terrible. Measured across the entire pybind11 ecosystem, I'd expect a lot of lost time.

I believe a more environmentally friendly approach is to fork the header: pybind11/numpy2.h (and to add guards so that only one can be used in a given translation unit)

Obviously that's significantly more work right here (e.g. we'll need to test both), but compared to what we'd put the rest of the world through otherwise it's nothing.

Tagging contributors to numpy.h, to see who has free bandwidth to work on this (I don't, at least not for a couple months):

@wjakob @jagerman @aldanor @dean0x7d @Skylion007 @EthanSteinberg

@seberg
Copy link
Contributor Author

seberg commented Mar 6, 2024

Well, the only ones that are likely to notice would be users of structured dtypes, so I would think that is few and they can work around with manual casts too, but don't know. I was thinking as a worst case, until helpers are added to cover usecases which might be lost.

@rwgk
Copy link
Collaborator

rwgk commented Mar 6, 2024

so I would think that is few

Sounds fair, I just don't know at all how many people will be affected.

I was thinking as a worst case, until helpers are added to cover usecases which might be lost.

I don't know that, too.

Lot's of uncertainty on my part == not good.

You could help me understand more by adding tests for exactly what will be affected/broken. That will probably give us ideas for ways to best manage the numpy2 transition.

Or wait to see if someone with more relevant background steps up here.

But there is another question buzzing in my mind: do you expect that there will be more numpy2-specific tweaks like this needed in the future?

I'm asking that question because I'm still wondering:

Is it better to freeze pybind11/numpy.h and to fork-and-modify to pybind11/numpy2.h?

Or stick with only pybind11/numpy.h and hope we can control the probably increasing messiness of it in the long run?

@EthanSteinberg
Copy link
Collaborator

@seberg Thank you for taking the initiative here, but I'm really unsure that this is the best path forward.

We really shouldn't be using NumPy's C API at all. We need to migrate over to Numpy's Python API

That would make us robust to the NumPy 1 vs 2 switch and also significantly reduce the complexity and brittleness of our code.

@seberg Do you think you could take on migrating us to the Python API?

@seberg
Copy link
Contributor Author

seberg commented Mar 7, 2024

Probably not right now, I don't know how quickly it can be done, and time's running out (how much depends on whether e.g. matplotlib needs it, I am hoping not but dunno)

I will note agaain the only thing this breaks signifanctly is direct struct access to the dtype by the user for structured dtypes.
The user could reinterpret cast that especially if they include numpy headers (scipy does that).

If you worry sbout those struct fields, I can add them to descr or document the cast to the versioned struct for access?

ssize_t alignment;
PyObject *metadata;
Py_hash_t hash;
void *reserved_null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void *reserved_null;
void *reserved_null[2];

include/pybind11/numpy.h Show resolved Hide resolved
@seberg
Copy link
Contributor Author

seberg commented Mar 8, 2024

OK, it sounds to me like creating a numpy2.h is probably the easier solution here if just to avoid worrying, so added a commit to do that and the user would just get an error when running with NumPy 2.0 that explains to include numpy2.h.
That commit just uses some #ifdefs not sure it is the nicest way, but duplicating files would be worse.

(Had segfaults running nox locally so seeing that it works here, planning to change the tests to use numpy2.h later. Not sure about testing in general, if releases are light-weight I might just risk doing a bug-fix release if the numpy2.h has an issue.)

EDIT: sorry for the noise, lots of silly mistake, although now also a crash in the tests which I didn't understand yet...

@seberg seberg force-pushed the numpy2-compat branch 5 times, most recently from dc9b59a to 1a76257 Compare March 8, 2024 11:01
This means that users of `numpy.h` cannot be broken, but need to
update to `numpy2.h` if they want to compile for NumPy 2.

Using Macros simply and didn't bother to try to remove unnecessary
code paths.
@seberg
Copy link
Contributor Author

seberg commented Mar 8, 2024

So maybe a #define is better (not sure, mainly due to the use in eigen/tensor and matrix)? Either way, I am a bit lost on the segfault and have to pause for a while. The urgency probably depends on whether matplotlib nightlies manage without changes.

@rwgk
Copy link
Collaborator

rwgk commented Mar 8, 2024

Thanks @seberg for all the work! Two quick comments:

I'm super busy with other things at the moment but will try to find a block of time to carefully look here asap.

@EthanSteinberg wrote:

We need to migrate over to Numpy's Python API

That's my highly-preferred solution, too, but I'm under the impression that'll be an XL-size effort. I wouldn't want to twist anyones arm doing that to unblock numpy2 support. I believe it'll be sufficient if we ensure that numpy1 and numpy2 are supported simultaneously for some period of time. I want to keep an open mind to how exactly that's done.

Which brings me to a related question: is there an official EOL for numpy1?

@rgommers
Copy link
Contributor

Which brings me to a related question: is there an official EOL for numpy1?

Not before the Q3/Q4 2025, according to https://scientific-python.org/specs/spec-0000/. Some users/packages may keep support for longer that that, since 2 years is not a very long window (sufficient for open source, not for certain types of industry and academic users).

I will note agaain the only thing this breaks signifanctly is direct struct access to the dtype by the user for structured dtypes.

It's important to emphasize that this really is about very few users. It would not surprise me if it's zero users, because:

  • Using structured dtypes from Python is already pretty niche
  • Using structured dtypes via NumPy's C API is much more niche
  • Using pybind11 to do so is even more niche

Is it better to freeze pybind11/numpy.h and to fork-and-modify to pybind11/numpy2.h?

Or stick with only pybind11/numpy.h and hope we can control the probably increasing messiness of it in the long run?

I'd vote for not creating a numpy2.h that all pybind11+numpy users need to opt into, just to avoid a hypothetical issue with structured dtypes. This PR is currently 113 lines long, and addresses the largest-ever breaking change in NumPy's C API since 1.0 in 2006. That is a pretty small diff - and there shouldn't be much change after that. So the risk of significantly increasing how messy numpy.h is is quite low.

On the other hand, if you do go with numpy2.h, you're breaking every single user of pybind11/numpy.h - that'd be a lot of churn.

Also note that most packages do not include upper bounds in their pybind11 dependency declaration, and they may be doing releases shortly after numpy 2.0.0rc1 is out (so in 10 days maybe). If you release a pybind11 version with a numpy2.h after that, you'll break such already-released packages. This is relevant unless @seberg goes through with raising an error on any pybind11 usage of private API (in that case, no pybind11-using packages will be able to do a release).

Why not go with the minimal #ifdef PYBIND11_NUMPY2_SUPPORT change in this PR, do a release, and then in case some user reports an issue deal with it then (that will certainly be less urgent, and worst-case it's possible to still create a numpy2.h then, only for the affected users).

@rwgk
Copy link
Collaborator

rwgk commented Mar 10, 2024

Meta: At the moment I only have a few minutes to look here. I'm still hoping someone with more relevant background than I have will step up to take the lead seeing this through. I realize it's urgent, so I'll help what I can if nobody else steps up, but I might have weird questions and ideas at first. (I also want to add that the mere existence of the current pybind11/numpy.h implementation badly rubs me the wrong way, but I'll try to hide that as much as I can.)


Why not go with the minimal #ifdef PYBIND11_NUMPY2_SUPPORT

Sounds good.

But a couple more quick question (I hope that will help me pick up here later when I have a block of time):

  • What will be the recommended way to define PYBIND11_NUMPY2_SUPPORT?

  • I strongly feel we need to add a test here (pybind11 GHA) using numpy2. Just one platform would be OK with me, e.g. by adding a new job in .github/workflows/ci.yml.

  • And we need to have some guidance somewhere (doc or .h), what to do in common situations. — E.g. what happens if a given application uses one extension built with PYBIND11_NUMPY2_SUPPORT and another without? I see this PR already has this guard:

#ifndef PYBIND11_NUMPY2_SUPPORT
    if (major_version >= 2) {
        throw std::runtime_error("module compiled without NumPy 2 support. Please "
                                 "define PYBIND11_NUMPY2_SUPPORT before the `numpy2.h` "
                                 "to enable it.");
    }
#endif

Do we also need a guard for the opposite situation?

@henryiii
Copy link
Collaborator

Aren’t structured dtypes the way you access arbitrary C structs from NumPy? If that’s what you mean, then many pybind11 projects use them, including two of the three projects I work on (boost-histogram and awkward array), which have tons of structured data. But I assume if this only affects some method of access, it wouldn’t be a problem. We mostly have arrays of structured data from that need to be cast into NumPy structured arrays and I can’t see that changing?

I’m off on paternity leave currently, so can’t investigate throughly yet.

@rgommers
Copy link
Contributor

What will be the recommended way to define PYBIND11_NUMPY2_SUPPORT?

Hmm, that should be done automatically if possible. Erroring out unless the user defined it would also be a breaking change for every user, just like having to opt in to numpy2.h (and in that case, numpy2.h is easier to use and less breaking).

@seberg I guess the problems you ran into is that you need it at preprocess time, so can't use the major_version that is already present in numpy.h, and you can't get it from numpy's numpyconfig.h because of a missing include path?

I strongly feel we need to add a test here (pybind11 GHA) using numpy2. Just one platform would be OK with me, e.g. by adding a new job in .github/workflows/ci.yml.

Makes sense. Let me offer some help here, to do more than pitch in suggestions. I'll work on this by tomorrow unless someone else has it in progress already.

And we need to have some guidance somewhere (doc or .h), what to do in common situations. — E.g. what happens if a given application uses one extension built with PYBIND11_NUMPY2_SUPPORT and another without?

Then one extension will be compatible with 1.x and 2.x, and the other won't be. So the resulting install will only work on 1.x. Not pybind11's concern I think? Just user error, or that package's numpy 2.0 support is WIP. There's nothing to do or recommend here - numpy itself will give a clear error message if the package is used with 2.0.

Aren’t structured dtypes the way you access arbitrary C structs from NumPy? If that’s what you mean, then many pybind11 projects use them, including two of the three projects I work on (boost-histogram and awkward array), which have tons of structured data. But I assume if this only affects some method of access, it wouldn’t be a problem.

Yes that's it, and yes only some accesses are affected (not array creation). I searched both code bases for use of things like alignment and flags, but that came up empty. Hard to do with a code search of course; would be good to see those projects testing with 2.0 nightlies/pre-releases.

@seberg
Copy link
Contributor Author

seberg commented Mar 11, 2024

pybind11 has some hooks to create a structured dtype and expose access to that same one. Those are indeed unaffected, only direct struct access of fields and names could be affected.

In either case that only matters if you make NumPy 2 support the default or only way, and the last iteration made it so that users have to opt-in.
(Annoying because everyone needs to do that, but ensures you definitely can't break anyones extension module on NumPy 1.x.)

and you can't get it from numpy's numpyconfig.h because of a missing include path?

pybind11 doesn't even require NumPy to be available at install time as far as I understand. If it did, there would probably not be a need for making a decision here, since the PYBIND11_NUMPY2_SUPPORT logic could just be tied to the NumPy version at install time.
But since pybind11 chooses to side-step NumPy, it is now in the position that it must decide.

NumPy can (and should) fail hard with a "this module wasn't compiled for 2.x" error/print out. That message can include a note to ensure to use pybind11 > X.Y if you use pybind11.

The newer pybind11 (i.e. this PR) can be a bit more verbose about how to get full compatibility, which I already added to the import error, although one could expand on that.

we need to add a test here (pybind11 GHA) using numpy2

I'll try to have a look, but no promises yet.

@rwgk
Copy link
Collaborator

rwgk commented Mar 19, 2024

Meta: Please do not force push to this PR anymore. Force pushes make it difficult to follow the progress and they invalidate commit hashes. Trial-and-error is totally fine.

I played with 242300d interactively and it seems to be working as I intended.

E.g.

#include <pybind11/pybind11.h>
#define PYBIND11_NUMPY_1_ONLY
#include <pybind11/numpy.h>

triggers the #error.

However, playing with this also resulted in segfaults, I believe because we're linking multiple translation units into pybind11_tests.so, thereby creating link incompatibilities also known as ODR violations.

Linking multiple translation units into a pybind11-based extension is probably not very common outside pybind11/tests, but probably there are a fair number of such use cases in the wild. This is a pretty mean problem to run into, difficult to root-cause.

I don't have a good idea for detecting at runtime (when an extension module is imported) if incompatible translation units were linked together. (It would be easy if we had production.cpp files and something like libpybind11.so, but with just headers I don't know what we could do to produce a clear and clean error.)

@seberg
Copy link
Contributor Author

seberg commented Mar 20, 2024

I played with 242300d interactively and it seems to be working as I intended.

Fair, test_eigen_matrix.cpp starts with #include <pybind11/eigen/matrix.h> so things like:

#include <pybind11/eigen/matrix.h>
#define PYBIND11_NUMPY_1_ONLY
#include <pybind11/numpy.h>

shouldn't I think. But I guess it may nudge many to put define as early as possible, which is safer.

I don't have a good idea for detecting at runtime

Two questions:

  1. Are the problems here only (or at least almost only) due to only partially recompiling some of the tests even though pybind11 is newer?
    If yes, wouldn't you run into this in other parts of pybind11 all the time, is all of the rest of pybind11 really 100% ABI compatible in it all "public" API with new versions!?
  2. If not and you really worry about it... Couldn't you break linking by defining things to: npy_api::get(int numpy_2_compatible = 1) (could pass value and check but as it is only done once anyway...).
    (or making it npy2_api and making npy_api just a define alias for it, but not sure that isn't more hacky.)

@henryiii
Copy link
Collaborator

henryiii commented Mar 20, 2024

Do we really need the "1 only" fallback? Do we know of a single package that would require it? numpy.h is not really public API for us. I think this define is a temporary, emergency fallback (though users can just use an old version of pybind11 if they can't support the numpy 2 form).

Using this PR fixed support for NumPy 2 in boost-histogram without any user changes, and I'm using structured data types, etc., and I think from what @rwgk said this passes Google's testing without ever needed the back-compat define. I'd rather recommend including the define as is now, but not advertising it, and see if anyone complains. If we never need to tell people about the define, we can then remove it in the next version. Or also okay with removing it unless we find a package that needs it.

If someone did need the define, we could recommend they add it in their build system (so it's always defined), or always at the top of all pybind11 modules. But I'm hoping pretty much no one needs it.

run: |
python3 -m pip install -r tests/requirements.txt
python3 -m pip install --upgrade --pre -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy scipy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both are on regular PyPI now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, plus I keep forgetting that >=2.0.0b1 works well, so changed. Changed.

run: |
python3 -m pip install -r tests/requirements.txt
python3 -m pip install --upgrade --pre -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy scipy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@rwgk
Copy link
Collaborator

rwgk commented Mar 20, 2024

A couple quick remarks:

@seberg wrote:

Fair, test_eigen_matrix.cpp starts with #include <pybind11/eigen/matrix.h> so things like:

#include <pybind11/eigen/matrix.h>
#define PYBIND11_NUMPY_1_ONLY
#include <pybind11/numpy.h>

shouldn't I think.

Confirmed. Sorry my thinking here was incorrect. I need to spend some time to see what we can do.

@henryiii wrote:

Do we really need the "1 only" fallback? ... But I'm hoping pretty much no one needs it.

Sounds good to me!

I won't try too hard then go fix my idea with the #error. (Maybe the imperfect nudging is good enough?)

@rwgk
Copy link
Collaborator

rwgk commented Mar 21, 2024

This also doesn't work as a wished:

#include <pybind11/numpy.h>
#define PYBIND11_NUMPY_1_ONLY
#include <pybind11/eigen/matrix.h>

After thinking about it more, I came to believe my wish for a reliable #error would need changes in every single public pybind11 header file, to put a guard at the top. I think that would be far more trouble than it's worth.

@seberg wrote:

But I guess it may nudge many to put define as early as possible, which is safer.

Keep it? (Because it is a very simple implementation, and does at least help a little bit.)

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions and a few questions, possibly naive (I don't know a lot about numpy internals).

include/pybind11/cast.h Outdated Show resolved Hide resolved
include/pybind11/numpy.h Outdated Show resolved Hide resolved
include/pybind11/numpy.h Show resolved Hide resolved
include/pybind11/numpy.h Show resolved Hide resolved
tests/test_numpy_dtypes.cpp Outdated Show resolved Hide resolved
tests/test_numpy_dtypes.cpp Outdated Show resolved Hide resolved
tests/test_eigen_matrix.py Show resolved Hide resolved
tests/test_numpy_array.py Show resolved Hide resolved
@rwgk
Copy link
Collaborator

rwgk commented Mar 21, 2024

Before replying to specific comments:

I'm probably having somewhat of an unusual perspective, working at Google: Almost every single change I make (in pybind11 or PyCLIF) triggers tens of millions of tests. Quite often, everything works, except, say, 10. Then I might spend anywhere between 5 minutes and 1+ week to find out what is special about those 10 out of 10 million use cases, and I need to fix it somehow, before I can move on. Therefore I've grown to be hyper sensitive to any form of subtle changes, especially removing tests, or changes for which there is no documented rationale.

I'm always thinking: The external world pays the same price, multiplied by a significant unknown factor, because multiple people have to ramp up root causing. What's "nice" though for core library developers: it's not them having to pay that price. The bill is footed by the user community.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! With the extra comments I don't feel so clueless anymore looking through the diffs. Those are exactly the kind of clues that'll help me (and I think others) root causing the one in a million failures I wrote about before.

Copy link
Contributor

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM from the perspective of NumPy usage. I also tested with SciPy after reverting the workaround for ->elsize that we were carrying, and that all worked like a charm.

include/pybind11/numpy.h Outdated Show resolved Hide resolved
Signed-off-by: Henry Schreiner <[email protected]>
@rwgk
Copy link
Collaborator

rwgk commented Mar 26, 2024

Tests pass: Looks Good To Me!

@henryiii henryiii merged commit 705efcc into pybind:master Mar 26, 2024
84 checks passed
@seberg seberg deleted the numpy2-compat branch March 27, 2024 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants