-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add RegularArray._reduce_next
implementation
#1811
Conversation
@jpivarski I haven't added any kernel tests yet to the kernel test data. Do you have any suggestions about how best to do this; do I just need to churn through some examples and add them to the data-file? |
Codecov Report
Additional details and impacted files
|
I'd missed a couple of places where our option-types eagerly coerce |
c62b82b
to
f671788
Compare
f671788
to
2511d4b
Compare
This kernel no longer requires post-sorting now that we have no `gaps` kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(GitHub ate my first review.)
This looks very well thought-through, and the large, explanatory comments are very helpful. It's not a situation in which I think the comments will get out of date because these things are not changed often or rapidly. I don't see anything missing, like a kernel implementation without a specification.
I haven't added any kernel tests yet to the kernel test data. Do you have any suggestions about how best to do this; do I just need to churn through some examples and add them to the data-file?
The test data was mostly made automatically. The hard part was choosing test inputs; the outputs were determined by running the kernels. It's easier to hand-craft some test inputs soon after having written the kernel, so if you have an idea in mind about test inputs that would not trivially skip the code, a good mix of valid and invalid inputs, then you can add those, run the function, and just insert the observed outputs as expected outputs. (I.e. we're not pretending to predict the function's behavior, we're just pinning it in place so we'll notice if it changes or if the CUDA version doesn't agree.)
# If the result of `_reduce_next` is a list, and we're not applying at this | ||
# depth, then it will have offsets given by the boundaries in parents. | ||
# This means that we need to look at the _contents_ to which the `outindex` | ||
# belongs to add the option type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is for the code that was removed.
# If the result of `_reduce_next` is a list, and we're not applying at this | |
# depth, then it will have offsets given by the boundaries in parents. | |
# This means that we need to look at the _contents_ to which the `outindex` | |
# belongs to add the option type |
Co-authored-by: Jim Pivarski <[email protected]>
._reduce_next`
…' into agoose77/fix-proper-regulararray
@jpivarski I've made the changes you requested, and importantly removed most of |
That's right, it is just a Very likely, this happened because the v1 C++ IndexedArray and IndexedOptionArray were a single class, and more work is needed for the IndexedOptionArray (the Do you want to add kernel test samples? (Not all of the kernels have them.) Otherwise, the PR is done and can be merged. |
Yes, this was my assessment too. And, it would have been risky to try and simplify this at v1→v2 time; much safer to do this now that things are stable and working. |
My brain is starting to hurt from spending so much time on reducers - is this something you'd have the cycles for? If not, then let's merge this and I'll make a mental note to get to this down the line. |
I was saying that it's optional. So I'll merge it now (after tests). Same for #1813. |
Our current reduction through a
RegularArray
is done by casting to and from aListOffsetArray
. As described in #1790, this has two consequences:The first point can be fixed (this code excerpt from
RegularArray
+ the changes toListOffsetArray
in this PR):Fix for
RegularArray._reduce_next
This second point means that just calling
toRegularArray()
on theListOffsetArray._reduce_next
reduction result is not sufficient; in the process of reducing the regular child of a ragged array with empty lists, we lose information about these empty lists, which then need care in order to be reconstructed. In addition to this bug, the whole process of going to-fromListOffsetArray64
here lossy, and involves multiple kernels, which reduces performance.The actual kernels required to implement
RegularArray._reduce_next
seem fairly trivial. So, I wrote this PR to implement them (only for reduction).This kind of code is really hard to reason about, though, so any extra pairs of eyes on the assumptions that I've made here would be very helpful. I might have got this horribly wrong; it's easy to get the wrong mental model, I've found.
Specifically, this PR:
ListOffsetArray
to return aRegularArray
instead ofListOffsetArray
whenkeepdims=True
. This seems wrong at first, but the return result of_reduce_next
is actually the parent layout. It's the parent's responsibility to coerce this to the correct type.RegularArray
is lower overhead than a ragged type.findgaps
kernel (we still assume that parents need to be locally contiguous, i.e.1 1 3 3 2 2
vs1 3 1 3 2 2
). I think this is reasonable, as there are good reasons for requiring local contiguity, but fewer for global.keepdims
always insert a length=1 axis to ensure broadcastabilityThis PR does not:
RegularArray._sort_next
etc. These should be done at some point (or at least, the ragged/regular type preservation improved), but there's future work on unifyingsort
andargsort
that will make this slightly easier.📚 The documentation for this PR will be available at https://awkward-array.readthedocs.io/en/agoose77-fix-proper-regulararray/ once Read the Docs has finished building 🔨