Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ refactoring: starting _getitem_next. #1031

Merged
merged 42 commits into from
Jul 29, 2021

Conversation

jpivarski
Copy link
Member

@jpivarski jpivarski commented Jul 22, 2021

This PR implements all types of slices allowed by NumPy on Awkward Arrays. Awkward's extensions of NumPy's slicing rules will be covered in a future PR.

In principle, there are 12 classes to cover: BitMaskedArray, ByteMaskedArray, EmptyArray, IndexedArray, IndexedOptionArray, ListArray, ListOffsetArray, NumpyArray, RecordArray, RegularArray, UnionArray, and UnmaskedArray with 10 types of slice items: no-slice (head == ()), integer (int), range-slice (slice), field (str), fields (list(str)), np.newaxis (which is None), Ellipsis (...), flat array (Index64), jagged array (ListOffsetArray), and option-type array (IndexedOptionArray). 120 cases.

However, putting off Awkward's extensions reduces this to only 8 types of slice by dropping jagged arrays and option-type arrays. (They can be left as NotImplementedError.) 96 cases.

Also, I did the "weird" classes, so that what remains would be more straightforward: all of EmptyArray (because it's a leaf), NumpyArray (because it's a leaf and very different—much simpler—in Python), RecordArray (because fields are fundamentally different from other slices), and RegularArray (because fake RegularArrays are used to structure the getitem procedure). 64 cases.

Also, I did the "weird" slice types, so that what remains would be more straightforward: no-slice, field, fields, np.newaxis, and Ellipsis. 24 cases.

Here's what remains, for you to finish off this PR:

  • BitMaskedArray
    • integer slice
    • range-slice
    • flat array
  • ByteMaskedArray
    • integer slice
    • range-slice
    • flat array
  • IndexedArray
    • integer slice
    • range-slice
    • flat array
  • IndexedOptionArray
    • integer slice
    • range-slice
    • flat array
  • ListArray
    • integer slice
    • range-slice
    • flat array
  • ListOffsetArray
    • integer slice
    • range-slice
    • flat array
  • UnionArray
    • integer slice
    • range-slice
    • flat array
  • UnmaskedArray
    • integer slice
    • range-slice
    • flat array

As you finish implementing each one, uncomment the corresponding tests in test_1031-start-getitem_next.py. When the PR is done, all of the code should be uncommented. test_1031b-start-getitem_next-specialized.py is for any additional tests you need to add while working things out.

@jpivarski jpivarski changed the title Starting _getitem_next. C++ factoring: starting _getitem_next. Jul 22, 2021
@jpivarski jpivarski marked this pull request as draft July 22, 2021 23:36
@jpivarski
Copy link
Member Author

@ioanaif As a hint for how to go about translating and debugging, I showed some examples of my own debugging: 86377cc adds debugging code to compare the Python with the original C++ and 3878e10 is the subsequent commit that removes that debugging code (not sure which would be easier to read).

Once the C++ codebase is compiled, adding std::cout is fast to recompile (if you're using the localbuild.py), so you can add equivalent print statements on both sides when something is different, to try to figure out why something is different. The new Python code goes through all of the same steps as the C++ (in these "non-weird" cases), so you can check the intermediate Indexes to be sure that they're exactly the same at each step (or narrow in on where they're different).

@jpivarski jpivarski changed the title C++ factoring: starting _getitem_next. C++ refactoring: starting _getitem_next. Jul 26, 2021
@jpivarski
Copy link
Member Author

I should have done this when you first started!

@all-contributors please add @ioanaif for code, test

@allcontributors
Copy link
Contributor

@jpivarski

I've put up a pull request to add @ioanaif! 🎉

@jpivarski jpivarski marked this pull request as ready for review July 28, 2021 17:25
Copy link
Member Author

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great! I just made a few tweaks: "ISOPTION" was from when IndexedArray and IndexedOptionArray were templated versions of the same class; now they're distinct classes. The "nextcarry_outindex" function is only used for handling the option-type, so IndexedArray doesn't need it (and anyway, it's exactly the same function).

@jpivarski jpivarski enabled auto-merge (squash) July 29, 2021 00:42
@jpivarski jpivarski merged commit 585569b into main Jul 29, 2021
@jpivarski jpivarski deleted the cpp-refactor/start-getitem_next branch July 29, 2021 01:14
jpivarski added a commit that referenced this pull request Oct 7, 2021
jpivarski added a commit that referenced this pull request Oct 12, 2021
* Starting to implement the type tracer for Awkward-Dask.

* Actually testing it.

* It predicts Forms and Types.

* Convenience methods for making typetracers.

* TypeTracer.instance().

* TypeTracerArray.__getitem__(slice).

* Black.

* Through RecordArray.

* Through BitMaskedArray.

* Added checks throughout 0896.

* Rename test-data.yml to kernel-test-data.yml.

* Instrumented tests from #959.

* Instrumented tests from #1031.

* numpy.broadcast_shapes is too new.

* Through 0011_v2-listarray in the v2 tests.

* Python 2.7 needs Ellipsis to be spelled out.

* Through 0020_v2-support-unsigned-indexes in the v2 tests.

* Through 0023_v2-regular-array in the v2 tests.

* Through all slicing in the v2 tests; next is 0070_v2-argmin-and-argmax.

* Through test_0070_v2-argmin-and-argmax.py in the v2 tests; ak.to_list was wrong, and subsequently the reducer implementation.

* First use of 'fill' (had to split into 'fill_zero', 'fill_other').

* Through 0111_v2-jagged-and-masked-getitem in the v2 tests; added nplike.known_shape.

* Through 0115_v2-generic-reducer-operation in the v2 tests.

* Through 1059_v2-localindex in the v2 tests.

* Added typetracer to all v2 tests (where possible).

* Fixed pre-commit.

* Fixed merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants