Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace simplify_optiontype and simplify_uniontype with simplified classmethod #1928

Merged

Conversation

jpivarski
Copy link
Member

@jpivarski jpivarski commented Nov 30, 2022

Instead of constructing non-canonical arrays and then simplifying them into something canonical, these @classmethods only construct canonical arrays from the start.

The next PR after this will enforce canonicity (level 1) in the Content constructors. This is the first half of #1910.


📚 The documentation for this PR will be available at https://awkward-array.readthedocs.io/en/jpivarski-replace-simplify-method-with-classmethod-constructor/ once Read the Docs has finished building 🔨

@jpivarski jpivarski added the pr-next-release Required for the next release label Nov 30, 2022
@jpivarski jpivarski changed the title Replace simplify_optiontype and simplify_uniontype with simplified classmethod refactor: replace simplify_optiontype and simplify_uniontype with simplified classmethod Nov 30, 2022
@codecov
Copy link

codecov bot commented Nov 30, 2022

Codecov Report

Merging #1928 (c226220) into main (1157d22) will decrease coverage by 0.14%.
The diff coverage is 76.03%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_broadcasting.py 93.39% <ø> (ø)
src/awkward/_slicing.py 85.84% <0.00%> (-0.27%) ⬇️
src/awkward/operations/ak_mask.py 95.23% <ø> (ø)
src/awkward/operations/ak_nan_to_none.py 23.52% <0.00%> (ø)
src/awkward/operations/ak_to_categorical.py 92.15% <ø> (-0.30%) ⬇️
src/awkward/index.py 84.10% <15.38%> (-6.48%) ⬇️
src/awkward/forms/indexedform.py 80.85% <41.17%> (-8.76%) ⬇️
src/awkward/forms/bitmaskedform.py 82.22% <46.66%> (-3.86%) ⬇️
src/awkward/forms/bytemaskedform.py 82.14% <46.66%> (-4.16%) ⬇️
src/awkward/forms/unmaskedform.py 81.94% <46.66%> (-4.95%) ⬇️
... and 32 more

src/awkward/forms/bitmaskedform.py Outdated Show resolved Hide resolved
src/awkward/forms/form.py Outdated Show resolved Hide resolved
next = item.simplify_optiontype()
return normalise_item_nested(next)
return normalise_item_nested(item)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the few that doesn't replace a simplify_optiontype with a simplified classmethod—not a pure refactor—so it's possible that non-canonical layouts pass through here.

However, I think this was sanitizing user input, and after constructors forbid non-canonical structures, user input can't be insane. So while this opens the door to error, the next PR will shut it.

return [self.simplify_optiontype()]
return [self]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another unbalanced simplify_optiontypesimplified pair, but in the future, self can't be non-canonical.

Comment on lines -1022 to +1018
return [self.simplify_optiontype()]
return [self]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unbalanced simplify_optiontypesimplified pair.

Comment on lines -1592 to +1591
return [self.simplify_optiontype()]
return [self]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unbalanced simplify_optiontypesimplified pair.

Comment on lines -482 to +479
return [self.simplify_optiontype()]
return [self]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unbalanced simplify_optiontypesimplified pair.

return continuation().simplify_optiontype()
return continuation()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unbalanced simplify_optiontypesimplified pair.

Comment on lines -86 to -87
if layout.is_option:
layout = layout.simplify_optiontype()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last unbalanced simplify_optiontypesimplified pair.

@jpivarski
Copy link
Member Author

@agoose77 The PyLint test is failing because nplike was removed from the argument list. (I'm not sure how...)

@agoose77
Copy link
Collaborator

@jpivarski ak_from_buffers takes a backend object now instead of nplike, which means the argument name has changed. I thought we'd okayed that w.r.t people like dask-awkward who will need to update. If you're still happy with #1922, I can make PRs for dask-awkward et al

This commit enables backends to pass through `regularize_backend`, and
ensures that we pass a backend (not nplike) to `from_buffers`
@agoose77
Copy link
Collaborator

I took your comment as a green light to push some fixes. Feel free to revert / rebase them if you're not happy.

Comment on lines -325 to +329
attempt = item.simplify_uniontype()
if isinstance(attempt, ak.contents.UnionArray):
raise ak._errors.wrap_error(
TypeError(
"irreducible unions (different types at the same level in an array) can't be used as slices"
)
raise ak._errors.wrap_error(
TypeError(
"irreducible unions (different types at the same level in an array) can't be used as slices"
)

return normalise_item_nested(attempt)
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not a simplify_uniontypesimplified pair because it's sanitizing input, just in case (like the option-type exceptions noted above).

Comment on lines 192 to 193
for i, self_cont in enumerate(contents):
for i, self_cont in enumerate(self_contents):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had been an error in the translation of simplify_uniontype to simplified. I don't see any similar errors.

Comment on lines 617 to +626
def _getitem_next_jagged_generic(self, slicestarts, slicestops, slicecontent, tail):
simplified = self.simplify_uniontype()
if isinstance(simplified, ak.contents.UnionArray):
if isinstance(self, ak.contents.UnionArray):
raise ak._errors.index_error(
self,
ak.contents.ListArray(
slicestarts, slicestops, slicecontent, parameters=None
),
"cannot apply jagged slices to irreducible union arrays",
)
return simplified._getitem_next_jagged(
slicestarts, slicestops, slicecontent, tail
)
return self._getitem_next_jagged(slicestarts, slicestops, slicecontent, tail)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not a simplify_uniontypesimplified pair because it's sanitizing input. There was a test providing insane inputs, so the simplified part of the pair went into the testing code.

Comment on lines -1296 to +1303
simplified = self.simplify_uniontype(True, True)
simplified = type(self).simplified(
self._tags,
self._index,
self._contents,
parameters=self._parameters,
backend=self._backend,
merge=True,
mergebool=True,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one re-simplifies the inputs because the merging strategy is more aggressive than normal (mergebool=True), so this is to prevent any changes in behavior.

Comment on lines -1305 to +1320
simplified = self.simplify_uniontype(True, True)
simplified = type(self).simplified(
self._tags,
self._index,
self._contents,
parameters=self._parameters,
backend=self._backend,
merge=True,
mergebool=True,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Comment on lines -1325 to +1347
simplified = self.simplify_uniontype(mergebool=True)
simplified = type(self).simplified(
self._tags,
self._index,
self._contents,
parameters=self._parameters,
backend=self._backend,
mergebool=True,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Comment on lines -1348 to +1377
simplified = self.simplify_uniontype(mergebool=True)
simplified = type(self).simplified(
self._tags,
self._index,
self._contents,
parameters=self._parameters,
backend=self._backend,
mergebool=True,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Comment on lines -1373 to +1409
simplified = self.simplify_uniontype(mergebool=True)
simplified = type(self).simplified(
self._tags,
self._index,
self._contents,
parameters=self._parameters,
backend=self._backend,
mergebool=True,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Comment on lines -118 to +126
out = out.simplify_uniontype(merge=merge, mergebool=mergebool)
out = type(out).simplified(
out._tags,
out._index,
out._contents,
parameters=out._parameters,
backend=out._backend,
merge=merge,
mergebool=mergebool,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above (mergebool might be True).

@jpivarski jpivarski marked this pull request as ready for review December 1, 2022 02:37
@jpivarski
Copy link
Member Author

@agoose77, this is now fully implemented and ready for review. Each step is called out in code comments, and the last step is this: the simplified classmethods now convert attempts to make

option[union[X, Y, ...]]

into

union[option[X], option[Y], ...]

That is, Arrow convention is now forced. The way that option-type push-down is implemented (there are many choices!) is this: one of the UnionArray contents, tag_for_missing (preferentially a RecordArray) is selected to be wrapped with IndexedOptionArray.simplified. If it's another option-type, more merging will happen. All of the other contents are wrapped with UnmaskedArray.simplified, which is a trivial operation, regardless of whether the thing being wrapped is option-type.

We can change the way tag_for_missing is selected at any time. Maybe it should be preferentially something that's already an option-type?

What this PR does not implement is the canonicity requirements in the Content constructors. That's why #1910 is not closed by this PR. But now that we have a constructor that makes canonical versions of any Content, the next PR just has to enforce those requirements, discover if we are using non-canonical constructions anywhere, and build them with simplified instead.

src/awkward/contents/unionarray.py Outdated Show resolved Hide resolved
src/awkward/contents/unionarray.py Outdated Show resolved Hide resolved
src/awkward/contents/unionarray.py Outdated Show resolved Hide resolved
src/awkward/contents/unionarray.py Outdated Show resolved Hide resolved
src/awkward/contents/unionarray.py Outdated Show resolved Hide resolved
src/awkward/operations/ak_where.py Outdated Show resolved Hide resolved
@jpivarski jpivarski enabled auto-merge (squash) December 1, 2022 17:47
@jpivarski jpivarski merged commit 31ce657 into main Dec 1, 2022
@jpivarski jpivarski deleted the jpivarski/replace-simplify-method-with-classmethod-constructor branch December 1, 2022 18:01
@jpivarski jpivarski removed the pr-next-release Required for the next release label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants