feat: add `RegularArray._reduce_next` implementation #1811

agoose77 · 2022-10-19T11:06:31Z

Our current reduction through a RegularArray is done by casting to and from a ListOffsetArray. As described in #1790, this has two consequences:

We lose regular dimensions quite easily
We lose information in the case of zero-length dimensions

The first point can be fixed (this code excerpt from RegularArray + the changes to ListOffsetArray in this PR):

Fix for RegularArray._reduce_next

        branch, depth = self.branch_depth

        if depth == negaxis:
            if keepdims:
                return ak.contents.RegularArray(
                    out.content.toRegularArray(),
                    1,
                    self.length,
                    None,
                    None,
                    self._nplike,
                )
            else:
                return out.toRegularArray()

        if keepdims and depth == negaxis + 1:
            outcontent = out.content
            assert isinstance(
                outcontent, (ak.contents.ListOffsetArray, ak.contents.RegularArray)
            )
            # Determined by self._content._reduce_next(..., len(self), ...)
            outcontent = ak.contents.RegularArray(
                outcontent.content,
                size=1,
                zeros_length=len(self),
            )
            out = ak.contents.ListOffsetArray(
                out.offsets,
                outcontent,
                out.identifier,
                out.parameters,
                out.nplike
            )
        elif depth > negaxis + 1:
            outcontent = out.content
            assert isinstance(
                outcontent, (ak.contents.ListOffsetArray, ak.contents.RegularArray)
            )
            # Determined by self._content._reduce_next(..., len(self), ...)
            outcontent = ak.contents.RegularArray(
                outcontent.content,
                size=self._size,
                zeros_length=len(self),
            )
            out = ak.contents.ListOffsetArray(
                out.offsets,
                outcontent,
                out.identifier,
                out.parameters,
                out.nplike
            )

This second point means that just calling toRegularArray() on the ListOffsetArray._reduce_next reduction result is not sufficient; in the process of reducing the regular child of a ragged array with empty lists, we lose information about these empty lists, which then need care in order to be reconstructed. In addition to this bug, the whole process of going to-from ListOffsetArray64 here lossy, and involves multiple kernels, which reduces performance.

The actual kernels required to implement RegularArray._reduce_next seem fairly trivial. So, I wrote this PR to implement them (only for reduction).

This kind of code is really hard to reason about, though, so any extra pairs of eyes on the assumptions that I've made here would be very helpful. I might have got this horribly wrong; it's easy to get the wrong mental model, I've found.

Specifically, this PR:

Changes ListOffsetArray to return a RegularArray instead of ListOffsetArray when keepdims=True. This seems wrong at first, but the return result of _reduce_next is actually the parent layout. It's the parent's responsibility to coerce this to the correct type. RegularArray is lower overhead than a ragged type.
Fixes cases where regular types were lost in reduction.
Assumes that parent ordering does not matter now that we have removed the findgaps kernel (we still assume that parents need to be locally contiguous, i.e. 1 1 3 3 2 2 vs 1 3 1 3 2 2). I think this is reasonable, as there are good reasons for requiring local contiguity, but fewer for global.
Fix the tests that assume we lose regularity.
Make keepdims always insert a length=1 axis to ensure broadcastability
Closes Unable to reduce regular dimension in mixed layout #1790

This PR does not:

Do the same for RegularArray._sort_next etc. These should be done at some point (or at least, the ragged/regular type preservation improved), but there's future work on unifying sort and argsort that will make this slightly easier.

📚 The documentation for this PR will be available at https://awkward-array.readthedocs.io/en/agoose77-fix-proper-regulararray/ once Read the Docs has finished building 🔨

agoose77 · 2022-10-19T15:44:02Z

@jpivarski I haven't added any kernel tests yet to the kernel test data. Do you have any suggestions about how best to do this; do I just need to churn through some examples and add them to the data-file?

codecov · 2022-10-19T15:56:36Z

Codecov Report

Merging #1811 (7b0133e) into main (569f183) will increase coverage by 0.00%.
The diff coverage is 86.11%.

Additional details and impacted files

Impacted Files	Coverage Δ
src/awkward/_slicing.py	`86.04% <ø> (+0.33%)`	⬆️
src/awkward/contents/unmaskedarray.py	`71.67% <0.00%> (+0.91%)`	⬆️
src/awkward/contents/indexedoptionarray.py	`88.90% <72.72%> (-0.18%)`	⬇️
src/awkward/contents/bytemaskedarray.py	`88.04% <77.77%> (-0.29%)`	⬇️
src/awkward/contents/regulararray.py	`89.92% <90.47%> (+0.70%)`	⬆️
src/awkward/contents/indexedarray.py	`79.46% <100.00%> (-0.50%)`	⬇️
src/awkward/contents/listoffsetarray.py	`79.47% <100.00%> (-0.06%)`	⬇️
src/awkward/nplikes.py	`66.60% <0.00%> (-0.20%)`	⬇️
src/awkward/_typetracer.py	`74.71% <0.00%> (-0.19%)`	⬇️
... and 6 more

agoose77 · 2022-10-20T11:29:35Z

I'd missed a couple of places where our option-types eagerly coerce RegularArrays into ListOffsetArrays. This fixes the few tests that had var introduced through reduction

This kernel no longer requires post-sorting now that we have no `gaps` kernel.

jpivarski

(GitHub ate my first review.)

This looks very well thought-through, and the large, explanatory comments are very helpful. It's not a situation in which I think the comments will get out of date because these things are not changed often or rapidly. I don't see anything missing, like a kernel implementation without a specification.

I haven't added any kernel tests yet to the kernel test data. Do you have any suggestions about how best to do this; do I just need to churn through some examples and add them to the data-file?

The test data was mostly made automatically. The hard part was choosing test inputs; the outputs were determined by running the kernels. It's easier to hand-craft some test inputs soon after having written the kernel, so if you have an idea in mind about test inputs that would not trivially skip the code, a good mix of valid and invalid inputs, then you can add those, run the function, and just insert the observed outputs as expected outputs. (I.e. we're not pretending to predict the function's behavior, we're just pinning it in place so we'll notice if it changes or if the CUDA version doesn't agree.)

src/awkward/contents/indexedarray.py

jpivarski · 2022-10-20T17:31:56Z

src/awkward/contents/indexedoptionarray.py

            # If the result of `_reduce_next` is a list, and we're not applying at this
            # depth, then it will have offsets given by the boundaries in parents.
            # This means that we need to look at the _contents_ to which the `outindex`
            # belongs to add the option type


This comment is for the code that was removed.

Suggested change

# If the result of `_reduce_next` is a list, and we're not applying at this

# depth, then it will have offsets given by the boundaries in parents.

# This means that we need to look at the _contents_ to which the `outindex`

# belongs to add the option type

src/awkward/contents/listoffsetarray.py

src/awkward/contents/regulararray.py

Co-authored-by: Jim Pivarski <[email protected]>

._reduce_next`

…' into agoose77/fix-proper-regulararray

agoose77 · 2022-10-20T18:36:14Z

@jpivarski I've made the changes you requested, and importantly removed most of IndexedArray._reduce_next. Could you check that I've not made an oversight here? My reasoning is that IndexedArray is effectively a deferred carry, which is realised during reduction. The nextparents, starts, etc. all apply to the result of applying self._content.carry(self._index), so it should just work™.

jpivarski · 2022-10-20T19:03:09Z

That's right, it is just a carry!

Very likely, this happened because the v1 C++ IndexedArray and IndexedOptionArray were a single class, and more work is needed for the IndexedOptionArray (the else clause of if (index[i] >= 0) in awkward_IndexedArray_reduce_next_64). I don't think I ever noticed that the IndexedArray case is simpler. Now that they're two Python classes, the ways in which IndexedArray can collapse down are more apparent.

Do you want to add kernel test samples? (Not all of the kernels have them.) Otherwise, the PR is done and can be merged.

agoose77 · 2022-10-20T19:31:28Z

Very likely, this happened because the v1 C++ IndexedArray and IndexedOptionArray were a single class, and more work is needed for the IndexedOptionArray (the else clause of if (index[i] >= 0) in awkward_IndexedArray_reduce_next_64).

Yes, this was my assessment too. And, it would have been risky to try and simplify this at v1→v2 time; much safer to do this now that things are stable and working.

agoose77 · 2022-10-20T19:34:02Z

Do you want to add kernel test samples? (Not all of the kernels have them.) Otherwise, the PR is done and can be merged.

My brain is starting to hurt from spending so much time on reducers - is this something you'd have the cycles for? If not, then let's merge this and I'll make a mental note to get to this down the line.

jpivarski · 2022-10-20T19:43:28Z

I was saying that it's optional. So I'll merge it now (after tests).

Same for #1813.

agoose77 marked this pull request as ready for review October 19, 2022 15:38

agoose77 requested a review from jpivarski October 19, 2022 15:39

agoose77 added the pr-needs-backport This PR needs a counterpart to backport to older versions label Oct 19, 2022

agoose77 mentioned this pull request Oct 19, 2022

feat: add RegularArray._reduce_next implementation (backport) #1813

Merged

1 task

agoose77 force-pushed the agoose77/fix-proper-regulararray branch from c62b82b to f671788 Compare October 20, 2022 12:45

agoose77 closed this Oct 20, 2022

agoose77 force-pushed the agoose77/fix-proper-regulararray branch from f671788 to 2511d4b Compare October 20, 2022 12:48

agoose77 added 7 commits October 20, 2022 13:51

feat: add RegularArray kernels

27aea68

feat: add RegularArray._reduce_next impl

4703c36

feat: make keepdims always add regular dimension

3c7c33a

fix: ensure we convert list-types to ListOffsetArray

c14b9e8

test: fix tests after change in behavior

6c7c59c

refactor: remove unneeded argsort call

cf96359

This kernel no longer requires post-sorting now that we have no `gaps` kernel.

chore: cleanup duplicate case

b55d989

agoose77 reopened this Oct 20, 2022

test: add test for fixed issue

c745f0a

jpivarski approved these changes Oct 20, 2022

View reviewed changes

agoose77 and others added 5 commits October 20, 2022 19:18

refactor: simplify IndexedArray._reduce_next

eb6dbda

Update src/awkward/contents/regulararray.py

0230794

Co-authored-by: Jim Pivarski <[email protected]>

refactor: further simplify IndexedArray._reduce_next

a1c57e6

docs: improve comments for `IndexedOption

f66fa21

._reduce_next`

Merge remote-tracking branch 'origin/agoose77/fix-proper-regulararray…

52aee95

…' into agoose77/fix-proper-regulararray

Merge branch 'main' into agoose77/fix-proper-regulararray

74b8c0a

chore: remove unused _represents_regular flag

7b0133e

jpivarski enabled auto-merge (squash) October 20, 2022 19:43

jpivarski merged commit fe2ff3f into main Oct 20, 2022

jpivarski deleted the agoose77/fix-proper-regulararray branch October 20, 2022 19:55

agoose77 mentioned this pull request Dec 3, 2022

fix: consolidate regular indexing #1943

Merged

agoose77 mentioned this pull request Jun 5, 2023

fix: starts handling in RegularArray._reduce_next #2492

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `RegularArray._reduce_next` implementation #1811

feat: add `RegularArray._reduce_next` implementation #1811

agoose77 commented Oct 19, 2022 •

edited by github-actions bot

Loading

agoose77 commented Oct 19, 2022

codecov bot commented Oct 19, 2022 •

edited

Loading

agoose77 commented Oct 20, 2022

jpivarski left a comment

jpivarski Oct 20, 2022

agoose77 commented Oct 20, 2022 •

edited

Loading

jpivarski commented Oct 20, 2022

agoose77 commented Oct 20, 2022

agoose77 commented Oct 20, 2022

jpivarski commented Oct 20, 2022

feat: add RegularArray._reduce_next implementation #1811

feat: add RegularArray._reduce_next implementation #1811

Conversation

agoose77 commented Oct 19, 2022 • edited by github-actions bot Loading

agoose77 commented Oct 19, 2022

codecov bot commented Oct 19, 2022 • edited Loading

Codecov Report

agoose77 commented Oct 20, 2022

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski Oct 20, 2022

Choose a reason for hiding this comment

agoose77 commented Oct 20, 2022 • edited Loading

jpivarski commented Oct 20, 2022

agoose77 commented Oct 20, 2022

agoose77 commented Oct 20, 2022

jpivarski commented Oct 20, 2022

feat: add `RegularArray._reduce_next` implementation #1811

feat: add `RegularArray._reduce_next` implementation #1811

agoose77 commented Oct 19, 2022 •

edited by github-actions bot

Loading

codecov bot commented Oct 19, 2022 •

edited

Loading

agoose77 commented Oct 20, 2022 •

edited

Loading