Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add to_cudf #3051

Merged
merged 26 commits into from
Sep 13, 2024
Merged

feat: Add to_cudf #3051

merged 26 commits into from
Sep 13, 2024

Conversation

martindurant
Copy link
Contributor

Not too much here yet, just started writing out the template to let others get involved

Martin Durant and others added 2 commits March 13, 2024 14:17
Copy link

codecov bot commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 25.58140% with 64 lines in your changes missing coverage. Please review.

Project coverage is 82.06%. Comparing base (b749e49) to head (950daeb).
Report is 158 commits behind head on main.

Files with missing lines Patch % Lines
src/awkward/contents/listoffsetarray.py 11.11% 16 Missing ⚠️
src/awkward/contents/bitmaskedarray.py 15.38% 11 Missing ⚠️
src/awkward/contents/numpyarray.py 15.38% 11 Missing ⚠️
src/awkward/contents/bytemaskedarray.py 18.18% 9 Missing ⚠️
src/awkward/contents/indexedarray.py 20.00% 4 Missing ⚠️
src/awkward/contents/recordarray.py 20.00% 4 Missing ⚠️
src/awkward/contents/emptyarray.py 25.00% 3 Missing ⚠️
src/awkward/operations/ak_to_cudf.py 75.00% 2 Missing ⚠️
src/awkward/contents/content.py 50.00% 1 Missing ⚠️
src/awkward/contents/indexedoptionarray.py 50.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/operations/__init__.py 100.00% <100.00%> (ø)
src/awkward/contents/content.py 74.91% <50.00%> (+0.09%) ⬆️
src/awkward/contents/indexedoptionarray.py 88.41% <50.00%> (+0.20%) ⬆️
src/awkward/contents/listarray.py 89.34% <50.00%> (-0.10%) ⬇️
src/awkward/contents/unmaskedarray.py 74.32% <50.00%> (+0.12%) ⬆️
src/awkward/operations/ak_to_cudf.py 75.00% <75.00%> (ø)
src/awkward/contents/emptyarray.py 75.23% <25.00%> (-0.39%) ⬇️
src/awkward/contents/indexedarray.py 81.68% <20.00%> (+0.64%) ⬆️
src/awkward/contents/recordarray.py 84.62% <20.00%> (-0.57%) ⬇️
src/awkward/contents/bytemaskedarray.py 87.67% <18.18%> (-1.98%) ⬇️
... and 3 more

... and 93 files with indirect coverage changes

@martindurant
Copy link
Contributor Author

I think we intended to come back to this PR and implement string conversion? At the moment, you have the following behaviour:

s = cudf.Series([["he", "hi"], [], ["ho"]])

In [6]: import awkward_pandas.cudf

In [7]: s.ak
Out[7]: <awkward_pandas.cudf.CudfAwkwardAccessor at 0x7ff1cb7a2d40>

In [8]: s.ak.to_output()  # lost string-ness
Out[8]: 
0    [[104, 101], [104, 105]]
1                          []
2                [[104, 111]]
dtype: list

In [9]: s.ak.array  # cupy-backed ak array
Out[9]: <Array [['he', 'hi'], [], ['ho']] type='3 * var * ?string'>

In [10]: ak.to_cupy(s.ak.array)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
 ...

File /floppy/code/awkward/src/awkward/_backends/cupy.py:39, in CupyBackend.__getitem__(self, index)
     38 _cuda_kernels = cuda.initialize_cuda_kernels(cupy)
---> 39 func = _cuda_kernels[index]
     40 if func is not None:

KeyError: ('awkward_NumpyArray_prepare_utf8_to_utf32_padded', <class 'numpy.uint8'>, <class 'numpy.int32'>, <class 'numpy.int64'>)

I'm not entirely sure why the two routes are different, but either way we do need this to work.

This PR is pinned to release 2.6.3, so it's also slipped behind the main branch, and we could do with some plan to complete it.

src/awkward/_errors.py Outdated Show resolved Hide resolved
@martindurant martindurant marked this pull request as ready for review August 13, 2024 20:20
@martindurant
Copy link
Contributor Author

@jpivarski , probably plenty of work left here and classes not covered, but I thought you would like to have a look, now that I have at least a few tests.

@martindurant
Copy link
Contributor Author

Missing: IndexedArray; does this map to categorical?

@jpivarski
Copy link
Member

IndexedArray is only mapped to Arrow's categorical ("dictionary") if "__array__": "categorical". Otherwise, the IndexedArray gets projected. The reason is that Arrow's categorical segfaults if the contents are not unique, and Awkward only guarantees uniqueness if requested, for instance if the array was constructed via ak.str.to_categorical.

Martin Durant and others added 4 commits August 14, 2024 21:27
@martindurant
Copy link
Contributor Author

@jpivarski , this PR is still languishing here. What do you think is needed to get it released, at least in experimental form?

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the import cupy was removed from top-level imports. (Anyway, tests wouldn't pass without that.) This all looks good—I don't know why I didn't see it when I was scanning through the open PRs for any that had been waiting.

I also just tested tests-cuda on my GPU, and everything passes.

So I'll merge it!

@jpivarski jpivarski merged commit 1c368f1 into scikit-hep:main Sep 13, 2024
45 checks passed
@martindurant martindurant deleted the to_cudf branch September 13, 2024 18:00
@tacaswell
Copy link
Contributor

This reverted commit rapidjson is on which breaks building with gcc 14 (xref #3110).

I could not quickly tell if this was intentional or accidental from the commits in this PR.

@tacaswell
Copy link
Contributor

#3249 to fix it.

@jpivarski
Copy link
Member

It was accidental. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants