Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register cudf.core.groupby.Grouper objects to dask grouper_dispatch #10838

Merged

Conversation

brandon-b-miller
Copy link
Contributor

This PR registers uses the (presumably shortly merged) dask Grouper dispatch to register cudf.core.groupby.Grouper objects to cudf.DataFrame objects. This should allow our own Grouper objects to be used in critical places in dask rather than pandas objects.

This solution is favorable IMO rather than changing cuDF to handle pandas grouper objects directly.

Xref dask/dask#9074

@brandon-b-miller brandon-b-miller added Python Affects Python cuDF API. dask Dask issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 12, 2022
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner May 12, 2022 15:19
@brandon-b-miller brandon-b-miller changed the title Register cudf.core.groupby.Grouper objects to dask get_grouper Register cudf.core.groupby.Grouper objects to dask grouper_dispatch May 18, 2022
@brandon-b-miller
Copy link
Contributor Author

@galipremsagar do I need to bump the dask version here to account for dask/dask#9074 ?

@galipremsagar
Copy link
Contributor

galipremsagar commented May 23, 2022

@galipremsagar do I need to bump the dask version here to account for dask/dask#9074 ?

Ideally yes, but since there is no dask version released yet. We will have to move the dispatch importing and registering to the try/except block just below it. Similar to percentile_lookup dispatch. Like:

try:
    from dask.dataframe.dispatch import grouper_dispatch
    @grouper_dispatch.register((cudf.Series, cudf.DataFrame))
     def get_grouper_cudf(obj):
        return cudf.core.groupby.Grouper
except ImportError:
    pass

@brandon-b-miller
Copy link
Contributor Author

tysm @galipremsagar !

@brandon-b-miller
Copy link
Contributor Author

Is it worth mirroring the tests added in dask/dask@e531cd0 ?

@galipremsagar
Copy link
Contributor

Is it worth mirroring the tests added in dask/dask@e531cd0 ?

Would it be easier to add an API call test that will go through the dispatch mechanisam? Instead of dispatch testing in dask/dask@e531cd0, since that is already taken care of in upstream.

@brandon-b-miller
Copy link
Contributor Author

Is it worth mirroring the tests added in dask/dask@e531cd0 ?

Would it be easier to add an API call test that will go through the dispatch mechanisam? Instead of dispatch testing in dask/dask@e531cd0, since that is already taken care of in upstream.

In that case I think it's covered by #10889 , so maybe it's OK for now.

@galipremsagar
Copy link
Contributor

Is it worth mirroring the tests added in dask/dask@e531cd0 ?

Would it be easier to add an API call test that will go through the dispatch mechanisam? Instead of dispatch testing in dask/dask@e531cd0, since that is already taken care of in upstream.

In that case I think it's covered by #10889 , so maybe it's OK for now.

Cool, then that sounds good to me.

@galipremsagar
Copy link
Contributor

rerun tests

@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 25, 2022
@galipremsagar
Copy link
Contributor

@gpucibot merge

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@efd2c39). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #10838   +/-   ##
===============================================
  Coverage                ?   86.61%           
===============================================
  Files                   ?      144           
  Lines                   ?    23511           
  Branches                ?        0           
===============================================
  Hits                    ?    20364           
  Misses                  ?     3147           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efd2c39...5718428. Read the comment docs.

@galipremsagar
Copy link
Contributor

@brandon-b-miller could you re-target this to 22.08 since we are pinning dask version to 2022.05.1 which anyway doesn't contain the dispatch?

@brandon-b-miller brandon-b-miller changed the base branch from branch-22.06 to branch-22.08 May 26, 2022 13:08
@rapids-bot rapids-bot bot merged commit 651c814 into rapidsai:branch-22.08 May 26, 2022
brandon-b-miller added a commit to brandon-b-miller/cudf that referenced this pull request May 26, 2022
…h` (rapidsai#10838)

This PR registers uses the (presumably shortly merged) dask `Grouper` dispatch to register `cudf.core.groupby.Grouper` objects to `cudf.DataFrame` objects. This should allow our own Grouper objects to be used in critical places in dask rather than pandas objects. 

This solution is favorable IMO rather than changing cuDF to handle pandas grouper objects directly. 

Xref dask/dask#9074

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: rapidsai#10838
rapids-bot bot pushed a commit that referenced this pull request Jul 27, 2022
Closes #10296

These _should_ actually just work if the following PRs get merged, after which this diff might be really small:

#10815
#10838
dask/dask#9074

Authors:
  - https://github.com/brandon-b-miller
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

URL: #10889
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge dask Dask issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants