-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex Improvements #10815
Reindex Improvements #10815
Conversation
python/cudf/cudf/core/dataframe.py
Outdated
index : Index, Series-convertible, optional, default None | ||
Shorthand for ``df.reindex(labels=index_labels, axis=0)`` | ||
columns : array-like, optional, default None | ||
Shorthand for ``df.reindex(labels=column_names, axis=1)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are index_labels
and column_names
? I think it's better to give concrete description compared to the shorthand
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I made a couple of updates to this docstring:
- In newer versions of pandas, the kwargs
index
,columns
,labels
, andaxis
are not individually documented despite being part of the function signature in the docs. Instead they are grouped under something calledkeywords for axes
. This threw me off a bit at first and I wondered if we might want to keep them separate in cuDF. Instead I pulled the parameter descriptions from the older pandas 1.0 docs, which broke out most of the parameters. That said, I did choose to break outcolumns
andindex
by themselves and wrote my own description for those kwargs. This is all just what makes sense to me most, so happy to make changes here. - I moved the note about the calling conventions down to the examples section
- I updated the cuDF examples to be the same as the pandas ones and added the one about
fill_value
. - I omitted the example that fills a numeric column with a string and casts it to string. This behavior is not supported yet. I am looking into making it happen inside of this PR but can also leave it as a follow up.
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #10815 +/- ##
===============================================
Coverage ? 86.32%
===============================================
Files ? 144
Lines ? 22706
Branches ? 0
===============================================
Hits ? 19601
Misses ? 3105
Partials ? 0 Continue to review full report at Codecov.
|
Hmm I wish I'd seen this earlier, I would have reviewed, sorry about that. Since we're in code freeze at this point should we push this to 22.08? I don't see anything here as a critical bug fix (although improving pandas alignment is nice I don't think it's urgent), what do you think @brandon-b-miller? |
Retargeted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions, but I don't think these are blocking questions.
Co-authored-by: Michael Wang <[email protected]>
…iller/cudf into enh-align-reindex-sig-pandas
everything look ok to you here @shwina ? |
@gpucibot merge |
Closes #10296 These _should_ actually just work if the following PRs get merged, after which this diff might be really small: #10815 #10838 dask/dask#9074 Authors: - https://github.com/brandon-b-miller - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - Charles Blackmon-Luca (https://github.com/charlesbluca) URL: #10889
This PR came up as part of solving #10296 which has to go through the
reindex
codepath with afill_value
. It does a number of things:reindex
signature with pandas_reindex
helper toIndexedFrame
fromDataFrame
whereasSeries
used to be promoting itself to a frame and calling the dataframe functionfill_value
fill_value
better and reduce code overall