-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support for Dask groupby cumulative sum, count #10296
Comments
This issue has been labeled |
ran across this today and happy to pick this up. |
This issue has been labeled |
This PR came up as part of solving #10296 which has to go through the `reindex` codepath with a `fill_value`. It does a number of things: - Aligns our `reindex` signature with pandas - Moves our `_reindex` helper to `IndexedFrame` from `DataFrame` whereas `Series` used to be promoting itself to a frame and calling the dataframe function - Provides support for `fill_value` - Refactors the relatively old tests for this functionality to support testing `fill_value` better and reduce code overall Authors: - https://github.com/brandon-b-miller Approvers: - Michael Wang (https://github.com/isVoid) - Ashwin Srinath (https://github.com/shwina) URL: #10815
In cuDF, we support
groupby.cumcount
like pandas. Dask supports groupby cumulative count on CPUs but not GPUs. From the traceback, it looks like Dask is using a Grouper object and we go down a codepath where the cuDF.Grouper appears to be failing an instance check (perhaps it's in a list or tuple) based onif isinstance(by, cudf.Grouper) and by.freq
.EDIT: Generalizing, as this appears to happen for both groupby cumulative count and sum operations.
The text was updated successfully, but these errors were encountered: