-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: consolidate regular indexing #1943
Conversation
…nly be idempotent for 1D arrays
@jpivarski this PR touches the indexing logic, and needs a careful eye. It makes two big changes:
(2) is the tricky point. First, a short history recap:
This PR addresses #1358, which exposes the lack of symmetry between NumpyArray and RegularArray for indexing. >>> a = ak.Array([[0, 1, 2], [3, 4], [5]])
>>> a[ak.argmin(a, axis=1, keepdims=True)]
<Array [[0], [3], [5]] type='3 * var * ?int64'>
>>> a[ak.argmin(a, axis=1, keepdims=True, mask_identity=False)]
<Array [[[0, 1, 2]], [[0, ...]], [[0, 1, 2]]] type='3 * 1 * var * int64'> The mask from I think the question here is not "is this correct?" because if
then the observed behaviour is consistent with this. It's only that, from a UX perspective, it's only the difference between Are you comfortable with this policy? (And indeed, anyone else on the team!) |
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I intended for any Iterable of strings to count as a fields selection, including Awkward Arrays of strings. Empty Iterables are only ambiguous (field-selection or row-selection by integers?) if untyped, and Awkward Arrays are one way of providing a runtime type. NumPy arrays are another.
I'd prefer to keep that feature. I guess there weren't any tests preventing you from changing it, but I did check that it worked while developing it.
On the other example with the ?int64
vs int64
toggling Awkward and NumPy slicing, I can see why that happens and I agree that it's confusing. We might say some point have to deprecate that behavior (warning on NumPy-style slicing and then phase it out—forcing users to explicitly wrap as NumPy at some point...?), but not now—it's too close to release time and that would be a major, major change.
So let's leave the confusing but "correct" argmax behavior. I'd like to switch back to letting any Iterable of strings select fields, though, if it's not too much trouble.
As a precursor statement, I would not advocate changing slicing dramatically at this point. So, agreed! Also, I don't have a proposal here - this seems to me to be a fundamental constraint with our indexing; we support many indexing features, and they are not mutually exclusive, so we have to choose according to some scheme. My long-running feeling has generally been that it's better to have useful type information (e.g. "I reduced this axis, and so I have a length-1 dimension") over pandering to the shortfalls of our indexing mechanism (e.g. "I reduced a var dimension, so it stays var"). And, to be clear, I don't think any of this is "wrong" or "right", it just has its pros and cons.
Actually, this seems like the "best" solution to me in the long-long term. Introducing a new accessor like I raised this just to make sure I'm not doing anything daft — this is very fundamental code I'm changing (fixing), and I wanted to make sure that we're all on the same page. |
I've reverted b4456fc and added new tests that enshrine this behavior :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reverting the Awkward Array of strings behavior.
I approve the intention of this PR, and I have a question about only one line of code. When you've answered it for yourself, you can merge.
I can't check the code deeply, but that one line of code was the only one that looks suspicious to me.
Fixes #1358 by using
maybe_to_NumpyArray
.RegularArray
s that succeed withmaybe_to_NumpyArray()
follow NumPy indexing. Previously, they followed Awkward Indexing.📚 The documentation for this PR will be available at https://awkward-array.readthedocs.io/en/agoose77-fix-regular-indexing/ once Read the Docs has finished building 🔨