feat: add ak.drop_none() #1904

ioanaif · 2022-11-21T14:37:52Z

This PR adds the drop_none functionality. Requested in #832

ak.drop_none(array, axis) - removes missing values (None) from a given array.

For example, in the following array,

a = ak.Array([[[0]], [[None]], [[1], None], [[2, None]]])

The None values will be removed, resulting in

>>> ak.drop_none(a)
<Array [[[0]], [[]], [[1]], [[2]]] type='4 * var * var * int64'>

The default axis is None. However, an axis can be specified:

>>> ak.drop_none(a,axis=1)
<Array [[[0]], [[None]], [[1]], [[2, None]]] type='4 * var * var * ?int64'>

…rray

codecov · 2022-11-22T10:36:36Z

Codecov Report

Merging #1904 (b19bed8) into main (3fc4adb) will increase coverage by 0.07%.
The diff coverage is 97.40%.

Additional details and impacted files

Impacted Files	Coverage Δ
src/awkward/contents/regulararray.py	`88.62% <50.00%> (-0.16%)`	⬇️
src/awkward/contents/content.py	`72.95% <75.00%> (+0.01%)`	⬆️
src/awkward/contents/bitmaskedarray.py	`69.85% <100.00%> (+0.57%)`	⬆️
src/awkward/contents/bytemaskedarray.py	`88.39% <100.00%> (+0.06%)`	⬆️
src/awkward/contents/indexedoptionarray.py	`88.55% <100.00%> (+0.04%)`	⬆️
src/awkward/contents/listarray.py	`90.45% <100.00%> (+0.08%)`	⬆️
src/awkward/contents/listoffsetarray.py	`79.82% <100.00%> (+0.33%)`	⬆️
src/awkward/contents/unmaskedarray.py	`73.00% <100.00%> (+0.24%)`	⬆️
src/awkward/operations/__init__.py	`100.00% <100.00%> (ø)`
src/awkward/operations/ak_drop_none.py	`100.00% <100.00%> (ø)`
... and 1 more

jpivarski

This is a good implementation and I can see that the tests are exhaustive, using the full suite of layouts.

I think it's good and ready to merge, as soon as the build-docs issue is fixed. If it's fixed in #1905 first, we'll merge that into main and then merge main into here. (One way or the other.)

src/awkward/operations/ak_drop_none.py

agoose77

Nice work @ioanaif!! I needed this feature so much during my analysis work.

I'm on holiday, but I know that this is a big feature and wanted to try and offer an additional set of eyes.

src/awkward/contents/listoffsetarray.py

src/awkward/contents/content.py

Co-authored-by: Angus Hollands <[email protected]>

agoose77 · 2022-11-22T17:48:05Z

I'm done touching this PR now (hands off) - I fixed the whitespace in my suggestion that didn't merge properly :)

jpivarski

I looked at the tests too hastily: this is not wrapping its result as a high-level array, which it should by default.

There's also an error (below) that I noticed when trying the RecordArray example, which should be possible. I don't know why RecordArray reveals it but NumpyArray doesn't. I'm looking more closely at this now...

src/awkward/operations/ak_drop_none.py

src/awkward/contents/indexedoptionarray.py

jpivarski

Okay, I'm sorry, but I missed some things. The to_list tests in test_1904-drop-none.py are insensitive to the difference between high-level and low-level arrays, so that's why we didn't catch that issue.

I suspect that having _drop_none return different types for different nodes (Content vs (Index, Content)) is probably the indirect cause, though I don't know what mechanism is responsible for it. Anyway, it will be safer to have all the _drop_none methods return the same type; the top-level drop_none (no underscore) is in a good position to drop the unnecessary outindex at the end.

Also, this should at least pass through records: when users want to remove missing values, they'll want to remove them from within records just as much as from recordless arrays. A good guide for this could be is_none, which also looks inside of arrays, and is_none and drop_none should be pretty similar.

>>> ak.is_none(ak.Array([[{"x": [1]}], [{"x": [None]}]]), axis=2).show()
[[{x: [False]}],
 [{x: [True]}]]

>>> ak.is_none(ak.Array([[{"x": [1], "y": [[2]]}], [{"x": [None], "y": [[None]]}]]), axis=-1).show()
[[{x: False, y: [False]}],
 [{x: False, y: [False]}]]

In the above, x and y have lists of different depths, but axis=-1 counts up from the bottom. That's why most functions keep calling wrap_axis_if_negative until it finally becomes positive (after the recursion has passed through the record and it's seeing a single, unambiguous depth).

I think these are the only two issues, though. There are tests in test_1904-drop-none.py that involve records; I'll take a look at what happened there now.

jpivarski

This looks like it's almost completely done. I see that you're using the new maybe_posaxis.

There are some cases that the tests didn't (couldn't) cover because the tests are based on slicing with is_none, which isn't possible if a record introduces different depths. I've worked through some examples that could be used as direct tests.

Starting from this array, which has missing values at all levels, with different levels for the x and y branches of the record.

array = ak.Array(
    [None,
     [None, {"x": [1], "y": [[2]]}],
     [{"x": [3], "y": [None]}, {"x": [None], "y": [[None]]}]
    ])

>>> array.show()
[None,
 [None, {x: [1], y: [[2]]}],
 [{x: [3], y: [None]}, {x: [None], y: [[None]]}]]

First, with axis=0, the result from is_none and drop_none are correct because only the None outside of any lists is at axis=0.

>>> ak.is_none(array, axis=0).show()
[True,
 False,
 False]
>>> ak.drop_none(array, axis=0).show()
[[None, {x: [1], y: [[2]]}],
 [{x: [3], y: [None]}, {x: [None], y: [[None]]}]]

Next, with axis=1, is_none and drop_none are still both correct.

>>> ak.is_none(array, axis=1).show()
[None,
 [True, False],
 [False, False]]
>>> ak.drop_none(array, axis=1).show()
[None,
 [{x: [1], y: [[2]]}],
 [{x: [3], y: [None]}, {x: [None], y: [[None]]}]]

Next, with axis=2, is_none is correct because it is only making x: [True] or y: [True] for a None at this level; anything deeper is hidden inside a False and anything shallower is visible, so is_none is correct.

However, drop_none is producing the right values but putting them in the wrong places: the x: [3] should be in the previous record, so drop_none is incorrect in this example.

>>> ak.is_none(array, axis=2).show()
[None,
 [None, {x: [False], y: [False]}],
 [{x: [False], y: [True]}, {x: [True], y: [False]}]]
>>> ak.drop_none(array, axis=2).show()
[None,
 [None, {x: [1], y: [[2]]}],
 [{x: [], y: []}, {x: [3], y: [[None]]}]]

Next, with axis=-1, the deepest x values have one [, ] while the deepest y values have two [[, ]]. The is_none function reports x to be [False] or [True] with one bracket, and it reports y to be [[False]] or [[True]] with two brackets, so is_none is correct.

The drop_none correctly leaves the y: [None] (one bracket) because it's above axis=-1 and it keeps x: [3] or removes it, x: [], also with one bracket because that's what axis=-1 means for x. However, the x: [3] should be on the previous record, so it's incorrect for this example.

>>> ak.is_none(array, axis=-1).show()
[None,
 [None, {x: [False], y: [[False]]}],
 [{x: [False], y: [None]}, {x: [True], y: [[True]]}]]
>>> ak.drop_none(array, axis=-1).show()
[None,
 [None, {x: [1], y: [[2]]}],
 [{x: [], y: [None]}, {x: [3], y: [[]]}]]

Finally, with axis=-2, is_none should say x: False and (potentially) x: True (no instances in this array) with zero brackets and y: [False] and y: [True] with one bracket. It does, and is_none is correct.

Thinking about what drop_none should do in this case, it can't remove an x field without also removing a y field, so axis=-2 should raise an exception for this kind of thing, and it doesn't:

>>> ak.is_none(array, axis=-2).show()
[None,
 [None, {x: False, y: [False]}],
 [{x: False, y: [True]}, {x: False, y: [False]}]]
>>> ak.drop_none(array, axis=-2).show()
[None,
 [None, {x: [1], y: [[2]]}],
 [{x: [3], y: []}, {x: [], y: [[None]]}]]

To be concrete: the original array has x: [1], x: [3], and x: [None] at this axis, all of which are not missing, but if one was, then it would want to remove that x value, yet axis=-2 for y does not mean for it to act at this level. You can implement this exception by adding code where the recursion gets to the RecordArray: if posaxis == depth - 1 for some of its fields but not others, then it should raise a np.AxisError.

(is_none raises an np.AxisError if axis=-3, which is impossible for what is_none wants to do.)

I need a better array2 for this case, so that I'm not talking in hypotheticals.

array2 = ak.Array(
    [None,
     [None, {x: [1], y: [[2]]}],
     [{x: None, y: [None]}, {x: [None], y: [[None]]}]
    ])

>>> array2.show()
[None,
 [None, {x: [1], y: [[2]]}],
 [{x: None, y: [None]}, {x: [None], y: [[None]]}]]

Now is_none at axis=-2 is

>>> ak.is_none(array2, axis=-2).show()
[None,
 [None, {x: False, y: [False]}],
 [{x: True, y: [True]}, {x: False, y: [False]}]]

which has x: False wherever x has a list and x: True wherever x has a direct None.

Trying to run drop_none on this raises a ValueError in the RecordArray constructor, but it should have been an np.AxisError before attempting to compute and construct the RecordArray.

These can be directly dropped in as new unit tests. ~~This PR should also increase the awkward-cpp version number to 3 in awkward-cpp/pyproject.toml.~~

I think the above is just one error in indexing plus needing to add one check-and-raise-exception. It's an enormous amount of work to get this far.

jpivarski · 2022-12-13T19:38:41Z

~~This PR should also increase the awkward-cpp version number to 3 in awkward-cpp/pyproject.toml.~~

Actually, I'm going to stick to the habit of always changing version numbers as direct commits to main, so that the fact that a version has changed is highly visible. I'll do that right now.

agoose77

I pushed some docs fixes, but any code suggestions I've left here! Nice work @ioanaif :)

src/awkward/operations/ak_drop_none.py

…ror, as they should.

jpivarski

I believe this is done! (Let's see the tests pass.)

Thanks for all of the hard work on this; it was definitely a lot more involved than I had thought it was going to be when I first suggested it. But now it works in all the extreme cases and we know it's not going to come up again as a bug in somebody's analysis.

It reminds me of something... found it:

This applies a lot more often than I'd like it to.

ioanaif · 2022-12-16T16:28:21Z

Yay! Hope all corner cases have been discovered! 🥳🥳

ioanaif force-pushed the ioanaif/add-drop-none-feature branch 3 times, most recently from db3a86f to ef611f4 Compare November 21, 2022 15:38

ioanaif and others added 2 commits November 21, 2022 16:49

feat: add ak.drop_none()

6a49e53

style: pre-commit fixes

76e72c9

ioanaif force-pushed the ioanaif/add-drop-none-feature branch from a625376 to 76e72c9 Compare November 21, 2022 15:49

ioanaif added 3 commits November 21, 2022 16:59

Moved cpp kernel to awkward-cpp

899825c

Fix renamed var names

ae8006b

IndexedOptionArray is not part of the roles dict, changed to IndexedA…

c671c46

…rray

ioanaif force-pushed the ioanaif/add-drop-none-feature branch from 28c7ad7 to c671c46 Compare November 22, 2022 10:30

ioanaif requested a review from jpivarski November 22, 2022 10:49

ioanaif mentioned this pull request Nov 22, 2022

feat: made 'very optional' arguments keyword-only #1905

Merged

jpivarski approved these changes Nov 22, 2022

View reviewed changes

src/awkward/operations/ak_drop_none.py Show resolved Hide resolved

agoose77 reviewed Nov 22, 2022

View reviewed changes

src/awkward/contents/listoffsetarray.py Show resolved Hide resolved

Merge branch 'main' into ioanaif/add-drop-none-feature

d1761b6

agoose77 reviewed Nov 22, 2022

View reviewed changes

src/awkward/contents/content.py Show resolved Hide resolved

jpivarski and others added 3 commits November 22, 2022 11:42

Add _drop_none to the Content interface/protocol.

3fea24e

Co-authored-by: Angus Hollands <[email protected]>

style: pre-commit fixes

e13ecb5

fix: correct whitespace & add return annotation

e72ebaa

ioanaif enabled auto-merge (squash) November 22, 2022 17:49

jpivarski requested changes Nov 22, 2022

View reviewed changes

src/awkward/operations/ak_drop_none.py Outdated Show resolved Hide resolved

jpivarski disabled auto-merge November 22, 2022 17:50

jpivarski reviewed Nov 22, 2022

View reviewed changes

src/awkward/contents/indexedoptionarray.py Outdated Show resolved Hide resolved

jpivarski requested changes Nov 22, 2022

View reviewed changes

jpivarski mentioned this pull request Nov 22, 2022

Long-range metadata checks during ak.Array creation #1910

Closed

ioanaif and others added 3 commits November 22, 2022 21:30

Merge branch 'main' into ioanaif/add-drop-none-feature

e02c796

Fix method signature, fix highlevel issues

77fdba7

style: pre-commit fixes

4643a3d

pre-commit-ci bot temporarily deployed to docs-preview December 13, 2022 17:32 Inactive

Updates for refactorings

ae2a84b

ioanaif temporarily deployed to docs-preview December 13, 2022 17:47 — with GitHub Actions Inactive

ioanaif requested a review from jpivarski December 13, 2022 17:50

jpivarski requested changes Dec 13, 2022

View reviewed changes

jpivarski added a commit that referenced this pull request Dec 13, 2022

chore: increase awkward-cpp version number for #1904 and #2001

307fa10

docs: include drop_none in rendered docs

b87e15f

agoose77 reviewed Dec 13, 2022

View reviewed changes

src/awkward/operations/ak_drop_none.py Outdated Show resolved Hide resolved

src/awkward/operations/ak_drop_none.py Outdated Show resolved Hide resolved

agoose77 temporarily deployed to docs-preview December 13, 2022 22:43 — with GitHub Actions Inactive

ioanaif and others added 2 commits December 14, 2022 09:15

Remove unused vars

aa59da7

style: pre-commit fixes

345d804

pre-commit-ci bot temporarily deployed to docs-preview December 14, 2022 08:28 Inactive

This was referenced Dec 14, 2022

Cannot install awkward from HEAD: awkward-cpp==3 not found #2003

Closed

Trying to get a non-standard entry in awkward array gives an error (code worked in 1.X.X) #2006

Closed

ioanaif added 2 commits December 16, 2022 10:47

Store none_indexes in queue to deal with multiple branches

846ef18

Merge branch 'main' into ioanaif/add-drop-none-feature

ea09638

ioanaif temporarily deployed to docs-preview December 16, 2022 10:04 — with GitHub Actions Inactive

ioanaif requested a review from jpivarski December 16, 2022 10:32

Add the review examples as tests.

32880ee

jpivarski approved these changes Dec 16, 2022

View reviewed changes

jpivarski temporarily deployed to docs-preview December 16, 2022 15:33 — with GitHub Actions Inactive

This should resolve the last test failures: make them raise np.AxisEr…

b19bed8

…ror, as they should.

jpivarski approved these changes Dec 16, 2022

View reviewed changes

jpivarski temporarily deployed to docs-preview December 16, 2022 16:18 — with GitHub Actions Inactive

jpivarski merged commit eb13f07 into main Dec 16, 2022

jpivarski deleted the ioanaif/add-drop-none-feature branch December 16, 2022 16:30

jpivarski added a commit that referenced this pull request Dec 16, 2022

Update awkward-cpp to version 4, for #1904 (new drop_none kernels).

89f7686

jpivarski mentioned this pull request Jan 5, 2023

Add a drop_none() #832

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ak.drop_none() #1904

feat: add ak.drop_none() #1904

ioanaif commented Nov 21, 2022 •

edited by agoose77

Loading

codecov bot commented Nov 22, 2022 •

edited

Loading

jpivarski left a comment

agoose77 left a comment

agoose77 commented Nov 22, 2022 •

edited

Loading

jpivarski left a comment

jpivarski left a comment

jpivarski left a comment •

edited

Loading

jpivarski commented Dec 13, 2022

agoose77 left a comment

jpivarski left a comment

ioanaif commented Dec 16, 2022

feat: add ak.drop_none() #1904

feat: add ak.drop_none() #1904

Conversation

ioanaif commented Nov 21, 2022 • edited by agoose77 Loading

codecov bot commented Nov 22, 2022 • edited Loading

Codecov Report

jpivarski left a comment

Choose a reason for hiding this comment

agoose77 left a comment

Choose a reason for hiding this comment

agoose77 commented Nov 22, 2022 • edited Loading

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski left a comment • edited Loading

Choose a reason for hiding this comment

jpivarski commented Dec 13, 2022

agoose77 left a comment

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

ioanaif commented Dec 16, 2022

ioanaif commented Nov 21, 2022 •

edited by agoose77

Loading

codecov bot commented Nov 22, 2022 •

edited

Loading

agoose77 commented Nov 22, 2022 •

edited

Loading

jpivarski left a comment •

edited

Loading