-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Restructure the dtype struct to be new dtype friendly #25943
Conversation
I assume this is a blocker for the 2.0.0 release? |
I did another code search and found one usage in newish code, here to stuff some extra metadata into the descriptor (it looks like to work around a numpy limitation). Seems like a nice thing to have and no strong reason to remove it, let's keep it.
One could imagine always providing a per-descriptor allocator like stringdtype does, for example, as a way to deal with thread safety and add nogil support. So let's do it. My only thought is that only having one extra slot might be limiting, could we define it in such a way that it can be extensible (e.g. using something like the
Sure, let's leave it, and in the future we can refactor numpy to use an accessor macro with a better name. There's a ton of downstream code using elsize too so this would cause some significant downstream code churn. |
Let me know if you want help. Even if it is just to brute force dissect out the part(s) of the codes related. |
Well, we have to force those to churn and use |
OK, the SciPy issues came also down to I have updated the docs. We should maybe discuss this briefly. (EDIT: It is unfortunate that the docs don't build, I think we can wait for SciPy to work, but if more fails, may need to see how to get the changes in) I will look into |
Two notes:
|
This modifies the main dtype/descriptor struct to: * Use intp for the elsize (and alignment) * A uint64 for flag space * To actually remove c_metadata and fields related to structured dtypes. It thus *breaks ABI* and the unfortunate souls who require access to `->elsize` or similar fields will have to vendor `npy_2_compat.h` or do something similar. (This should not be super many though, most can use PyArray_ITEMSIZE). The changes also require moving a few other places to be run-time and fixing allocation of structs. This commit is not complete: New warnings and errors still need to be fixed.
(for once shipping f2py makes it easier)
Note that the (rare) aligned struct case will not allow unpickling NumPy 2.x files with NumPy 1.x. This could be added if necessary. Unpickling 1.x files in 2.x is unproblematic
2 should work (unless I miscounted badly).
I think this should be ready, but of course someone should have a look over. @lithomas1 once this is merged, pandas will need some small updates to the json c reader (IIRC), shouldn't be more than the similar ones that arrow needs (@jorisvandenbossche is aware). If the arrow code-paths are hit, this might crash pandas, although building the wheel would still be fine. Right now, I don't think it is looking too bad, but of course I don't know... if it ends up being bad we might have to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, and lays the groundwork for dropping the old struct in a few years when all user-defined dtypes move to NumPy2. I still need a bit of convincing around the reserved space, but not a big deal since the whole structure is meant to be opaque.
Edit: the spare fields are needed for subclassing. Makes sense.
Co-authored-by: Matti Picus <[email protected]>
Tests are failing but not because of this PR, probably due to #21760. I will comment there. The error is (note the extra
|
I will merge this to keep the momentum going. Let's see what breaks downstream... |
Thanks @seberg |
Thanks for the heads up. Can someone from numpy hit the build button on the nightlies now that this is merged? |
Yesterday it sounded a bit like you wanted to do that @rgommers, so you can make sure to do it on SciPy quickly after as well? |
I have already triggered the builds, at least those not on cirrus. |
Hi - I found that this PR broke the openxla build, due to this line: https://github.com/openxla/xla/blob/24eaeeab8e7465cef0fc655cd9fd8f6060485a27/xla/python/nb_numpy.h#L55 What would be the best way to access |
Right, the caveat is, it needs array_import now, but you probably already have that. |
Thanks. Unfortunately, what the code has access to there is just a |
Maybe I need to use compiler directives to conditionally define |
Yes unfortunately this access pattern needs to be dealt with using e.g. an |
Yes, just |
Ah, I see now that the |
Note this PR is an actual ABI breaking change: No downstream project should be expected to work without recompiling (which means that our doc builds will just crash also).
This PR does a few changes, I would like to keep docs in a follow-up, but happy either way. Right now marking as draft because:
The actual changes are the following:
intp
(I don't think it matters for alignment but seemed fine).c_metadata
,subarray
,fields
, andnames
is now fully gone. (metadata is still there).Plus cleanup and moving things between headers as they are now version dependent.
@ngoldbaum three things I would like your opinion on:
metadata
for the simple reason that it seemed at least somewhat useful and used and because of the preparation that was easier. I am happy to remove it.void *reserved
slot that we define to be NULL but can give a meaning in the future. I don't care too much about it, we have flags and subclasses, but it is possible...elsize
name, just to keep the diff smaller. But we could rename both or just the accessor toITEMSIZE
.None of them seem like a huge deal, compared to getting rid of the other fields and bumping the size.