Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when slicing two dimensions using a masked array #1305

Closed
jrueb opened this issue Feb 21, 2022 · 2 comments · Fixed by #1598
Closed

RuntimeError when slicing two dimensions using a masked array #1305

jrueb opened this issue Feb 21, 2022 · 2 comments · Fixed by #1598
Labels
bug The problem described is something that must be fixed

Comments

@jrueb
Copy link
Contributor

jrueb commented Feb 21, 2022

Version of Awkward Array

1.8.0rc3

Description and code to reproduce

Please consider the following

import awkward as ak


x = ak.Array([[5.]])
y = ak.Array([True])

x[y, 0]
x[y.mask[[True]], 0]

The last line will raise the following error
RuntimeError: ListOffsetArray::getitem_next(SliceAt): !advanced.is_empty_advanced()
I don't think the line should raise an error and I am unsure what the error message is telling me.

@jrueb jrueb added the bug (unverified) The problem described would be a bug, but needs to be triaged label Feb 21, 2022
@jpivarski jpivarski added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Feb 21, 2022
@jpivarski
Copy link
Member

That's right. This has reached some unconsidered code path. It is faithfully reproduced in v2:

import awkward as ak
x = ak._v2.Array([[5.]])
y = ak._v2.Array([True])
x[ak._v2.mask(y, [True]), 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/highlevel.py", line 1010, in __getitem__
    out = self._layout[where]
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/contents/content.py", line 486, in __getitem__
    out = next._getitem_next(nextwhere[0], nextwhere[1:], None)
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/contents/regulararray.py", line 590, in _getitem_next
    return self._getitem_next_missing(head, tail, advanced)
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/contents/content.py", line 406, in _getitem_next_missing
    nextcontent = self._getitem_next(headcontent, tail, advanced)
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/contents/regulararray.py", line 492, in _getitem_next
    out = nextcontent._getitem_next(nexthead, nexttail, nextadvanced)
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/contents/listarray.py", line 511, in _getitem_next
    assert advanced is None
AssertionError

and it would probably be easier to debug in v2, as that's Python code.

The "advanced" Index is for passing down information needed in advanced indexing, which is to say, slicing more than one dimension by an array. That happens here because the integer 0 gets promoted to an array because it's in a tuple (NumPy rules: arrays and integers in a slice tuple get broadcasted together, which promotes integers into arrays).

When descending through a layout tree, we might have an advanced Index at some level or we might not. (When we first hit an array or index that has been broadcasted to one, we create the advanced Index and pass it to the whole subtree.) Early on, this was represented in v1 as an empty Index, but there were valid cases in which an advanced Index can exist and have zero length. So in v1, the is_empty_advanced boolean keeps track of whether this Index represents an existing Index or not.

In v2, we use None to say it isn't there. (Yay, Python!)

The slice of x has two dimensions, one of which is an Awkward Array with nullable data (y.mask[[True]]) and the other is an integer, promoted to an array. Whereas NumPy expresses slices on multiple dimensions as multiple arrays in a tuple,

array[slice_dimension_1, slice_dimension_2]

Awkward Array supports this as much as it has to and extends it in a different way: by requiring the nested structure of the slicer and the sliced to match (broadcastable):

jagged_jagged_array[another_jagged_jagged_array]

What happens when you mix the two is minimally tested:

jagged_jagged_array[a_jagged_array, deeper_index]

A semantics has been defined: the deeper_index continues the recursion in the layout tree from the point where a_jagged_array stops. The case you described may or may not be allowed (I'd have to think hard about that), but this is unquestionably a bug in the sense that if you weren't supposed to do that, you'd get a ValueError, IndexError, or TypeError with an informative message, not a RuntimeError (v1's C++) or AssertionError (v2), indicating a programming error.

@jpivarski
Copy link
Member

The assertions in _getitem_next were all built around the assumption that multiple tuple items in the slice have all been broadcasted, and there's a preprocessing step that guarantees that it has been broadcasted for NumPy-like slices. There were zero test cases of mixing Awkward-like and NumPy-like slices in the test suite: it's just something we hadn't considered.

Therefore, it's also something we don't support. The code written above now raises an intentional error, and hopefully it would be clear what's going on and how to proceed if you run into it again:

>>> array = ak._v2.Array([[3.14]])
>>> first_slice = ak._v2.Array([True, None])[:1]   # type: 1 * ?bool
>>> first_slice
<Array [True] type='1 * ?bool'>
>>> second_slice = 0
>>> array[first_slice, second_slice]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward/awkward/_v2/highlevel.py", line 959, in __getitem__
    out = self._layout[where]
  File "/home/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 485, in __getitem__
    return self._getitem(where)
  File "/home/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 512, in _getitem
    nextwhere = ak._v2._slicing.getitem_broadcast(items)
  File "/home/jpivarski/irishep/awkward/awkward/_v2/_slicing.py", line 46, in getitem_broadcast
    raise ak._v2._util.error(
TypeError: while attempting to slice (from <stdin>, line 1)

    <Array [[3.14]] type='1 * var * float64'>

with

    (<Array [True] type='1 * ?bool'>, 0)

Error details: cannot mix Awkward slicing (using an array with missing or variable-length lists in the
slice) with NumPy advanced slicing (using more than one broadcastable array or integer in the slice),
though you can perform multiple slices

@jpivarski jpivarski linked a pull request Aug 15, 2022 that will close this issue
jpivarski added a commit that referenced this issue Aug 15, 2022
* Fixes #1305 by raising an exception on mixed Awkward/NumPy slices.

* Don't let Index._metadata be an nplike.
agoose77 added a commit that referenced this issue Aug 26, 2022
agoose77 added a commit that referenced this issue Aug 26, 2022
* Fix: tentative fix for empty slice lists

* Test: validate fix

* Update tests/v2/test_1593-empty-slice-list-record.py

Co-authored-by: ioanaif <[email protected]>

* Fix: convert `[]` into `slice(0, 0)`

This bypasses the `slice_fields` mechanism, which is really orthogonal to the content length `_getitem` implementations.

* Refactor: check explicitly for iterables with lengths

* Test: fix to match v1

* Fix: normalise [] to empty array

* Fix: return `NumpyLike.array` instead of index

* Fix: support empty Index64

* Docs: add small docstring elucidating getitem_broadcast

* Fix: update `getitem_next_array_wrap` to accept `outer_length` argument

* Fix: remove list special-case

* Fix: pass list length to `getitem_next_array_wrap`

* feat: raise error if unsupported indexing used

This prevents the second-case of advanced indexing referred to in https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing:~:text=In%20the%20first,just%20like%20slicing from being used

* test: fix use of v1 record

* fix: respect zero-length of regular array

* test: respect new error during indexing

* test: fix expected output of slice

* test: fix expected output of slice

* test: fix expected output of slice

* fix: return view in `TypeTracerArray.view`

* feat: add `TypeTracerArray.astype`

* feat: raise `ValueError` for mixed indices

This changes #1305

* refactor: move NumPy index validation into broadcasting

This also renames various utility functions to make the concerns more apparent.

* fix: actually raise exception

* refactor: remove unneeded length check

Co-authored-by: ioanaif <[email protected]>
Saransh-cpp pushed a commit to Saransh-cpp/awkward that referenced this issue Aug 29, 2022
…#1597)

* Fix: tentative fix for empty slice lists

* Test: validate fix

* Update tests/v2/test_1593-empty-slice-list-record.py

Co-authored-by: ioanaif <[email protected]>

* Fix: convert `[]` into `slice(0, 0)`

This bypasses the `slice_fields` mechanism, which is really orthogonal to the content length `_getitem` implementations.

* Refactor: check explicitly for iterables with lengths

* Test: fix to match v1

* Fix: normalise [] to empty array

* Fix: return `NumpyLike.array` instead of index

* Fix: support empty Index64

* Docs: add small docstring elucidating getitem_broadcast

* Fix: update `getitem_next_array_wrap` to accept `outer_length` argument

* Fix: remove list special-case

* Fix: pass list length to `getitem_next_array_wrap`

* feat: raise error if unsupported indexing used

This prevents the second-case of advanced indexing referred to in https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing:~:text=In%20the%20first,just%20like%20slicing from being used

* test: fix use of v1 record

* fix: respect zero-length of regular array

* test: respect new error during indexing

* test: fix expected output of slice

* test: fix expected output of slice

* test: fix expected output of slice

* fix: return view in `TypeTracerArray.view`

* feat: add `TypeTracerArray.astype`

* feat: raise `ValueError` for mixed indices

This changes scikit-hep#1305

* refactor: move NumPy index validation into broadcasting

This also renames various utility functions to make the concerns more apparent.

* fix: actually raise exception

* refactor: remove unneeded length check

Co-authored-by: ioanaif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants