Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex Improvements #10815

Merged

Conversation

brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented May 9, 2022

This PR came up as part of solving #10296 which has to go through the reindex codepath with a fill_value. It does a number of things:

  • Aligns our reindex signature with pandas
  • Moves our _reindex helper to IndexedFrame from DataFrame whereas Series used to be promoting itself to a frame and calling the dataframe function
  • Provides support for fill_value
  • Refactors the relatively old tests for this functionality to support testing fill_value better and reduce code overall

@github-actions github-actions bot added the Python Affects Python cuDF API. label May 9, 2022
@brandon-b-miller brandon-b-miller marked this pull request as ready for review May 11, 2022 15:34
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner May 11, 2022 15:34
@brandon-b-miller brandon-b-miller added feature request New feature or request non-breaking Non-breaking change labels May 11, 2022
Comment on lines 2207 to 2210
index : Index, Series-convertible, optional, default None
Shorthand for ``df.reindex(labels=index_labels, axis=0)``
columns : array-like, optional, default None
Shorthand for ``df.reindex(labels=column_names, axis=1)``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are index_labels and column_names? I think it's better to give concrete description compared to the shorthand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I made a couple of updates to this docstring:

pandas docs for reference

  • In newer versions of pandas, the kwargs index, columns, labels, and axis are not individually documented despite being part of the function signature in the docs. Instead they are grouped under something called keywords for axes. This threw me off a bit at first and I wondered if we might want to keep them separate in cuDF. Instead I pulled the parameter descriptions from the older pandas 1.0 docs, which broke out most of the parameters. That said, I did choose to break out columns and index by themselves and wrote my own description for those kwargs. This is all just what makes sense to me most, so happy to make changes here.
  • I moved the note about the calling conventions down to the examples section
  • I updated the cuDF examples to be the same as the pandas ones and added the one about fill_value.
  • I omitted the example that fills a numeric column with a string and casts it to string. This behavior is not supported yet. I am looking into making it happen inside of this PR but can also leave it as a follow up.

python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Show resolved Hide resolved
python/cudf/cudf/core/series.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented May 18, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@ad00e44). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #10815   +/-   ##
===============================================
  Coverage                ?   86.32%           
===============================================
  Files                   ?      144           
  Lines                   ?    22706           
  Branches                ?        0           
===============================================
  Hits                    ?    19601           
  Misses                  ?     3105           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad00e44...d24b0e1. Read the comment docs.

@brandon-b-miller
Copy link
Contributor Author

Everything look good to go here? @shwina @isVoid

@vyasr
Copy link
Contributor

vyasr commented May 25, 2022

Hmm I wish I'd seen this earlier, I would have reviewed, sorry about that. Since we're in code freeze at this point should we push this to 22.08? I don't see anything here as a critical bug fix (although improving pandas alignment is nice I don't think it's urgent), what do you think @brandon-b-miller?

@brandon-b-miller brandon-b-miller changed the base branch from branch-22.06 to branch-22.08 May 25, 2022 18:54
@brandon-b-miller
Copy link
Contributor Author

Retargeted

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions, but I don't think these are blocking questions.

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
@brandon-b-miller
Copy link
Contributor Author

everything look ok to you here @shwina ?

@shwina
Copy link
Contributor

shwina commented Jun 7, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit aec9007 into rapidsai:branch-22.08 Jun 7, 2022
rapids-bot bot pushed a commit that referenced this pull request Jul 27, 2022
Closes #10296

These _should_ actually just work if the following PRs get merged, after which this diff might be really small:

#10815
#10838
dask/dask#9074

Authors:
  - https://github.com/brandon-b-miller
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

URL: #10889
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants