Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Keep single-series DataFrames as DataFrames in preprocessing #3069

Merged
merged 9 commits into from
Nov 16, 2020

Conversation

wphicks
Copy link
Contributor

@wphicks wphicks commented Oct 27, 2020

Ensure that single-series DataFrames are not converted to Series during preprocessing

Resolve #3059

@wphicks wphicks requested a review from a team as a code owner October 27, 2020 22:43
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 27, 2020

@viclafargue Would you mind taking a look at this one?

Copy link
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for tracking down the bug in #3059! I'm just a bit concerned about the possible case in which one preprocessor can receive a Series but has to output a Dataframe. But if all tests pass, it should be fine. Otherwise, LGTM !

@wphicks
Copy link
Contributor Author

wphicks commented Oct 28, 2020

Thanks for the review @viclafargue! To make sure I understand, you're thinking about a preprocessor that takes in a single Series but then blows it out into something 2D, right? That is indeed something I hadn't thought about. Let me update to_output_type as well to handle that case appropriately; otherwise we'd get a ValueError from the CumlArray code.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 28, 2020

@viclafargue Mind taking a look at the update? I think this should handle the case you pointed out without requiring further specialized code in each preprocessor.

@viclafargue
Copy link
Contributor

viclafargue commented Oct 28, 2020

@viclafargue Mind taking a look at the update? I think this should handle the case you pointed out without requiring further specialized code in each preprocessor.

The update and its testing look good to me. Thanks @wphicks!

@wphicks
Copy link
Contributor Author

wphicks commented Oct 28, 2020

rerun tests

@codecov-io
Copy link

codecov-io commented Oct 29, 2020

Codecov Report

Merging #3069 (b3c2719) into branch-0.17 (25b29f5) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.17    #3069   +/-   ##
============================================
  Coverage        69.38%   69.39%           
============================================
  Files              193      193           
  Lines            14704    14708    +4     
============================================
+ Hits             10203    10207    +4     
  Misses            4501     4501           
Impacted Files Coverage Δ
...on/cuml/_thirdparty/sklearn/preprocessing/_data.py 63.54% <100.00%> (+0.08%) ⬆️
python/cuml/thirdparty_adapters/adapters.py 88.55% <100.00%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25b29f5...b3c2719. Read the comment docs.

@dantegd dantegd added 5 - Ready to Merge Testing and reviews complete, ready to merge Cython / Python Cython or Python issue labels Oct 29, 2020
@wphicks wphicks added 4 - Waiting on Author Waiting for author to respond to review and removed 5 - Ready to Merge Testing and reviews complete, ready to merge labels Oct 29, 2020
@wphicks
Copy link
Contributor Author

wphicks commented Oct 29, 2020

Looks like that test failure may be genuine. I'll look into it and update.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 31, 2020

I was not able to reproduce this locally with my existing conda env, so I'm suspicious of an upstream change. I've been trying to update my local conda env, but it keeps hanging indefinitely. I'll have to take a closer look a little bit later.

@wphicks
Copy link
Contributor Author

wphicks commented Nov 10, 2020

Finally fixed the error that CI picked up. I tweaked the normalize function to prevent returned norms from being coerced to DataFrames unnecessarily.

@viclafargue Would you mind doing a re-review on this one?

@wphicks wphicks removed the 4 - Waiting on Author Waiting for author to respond to review label Nov 10, 2020
@wphicks wphicks added the 4 - Waiting on Reviewer Waiting for reviewer to review or respond label Nov 10, 2020
@wphicks wphicks added 5 - Ready to Merge Testing and reviews complete, ready to merge 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond 5 - Ready to Merge Testing and reviews complete, ready to merge labels Nov 16, 2020
Copy link
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me! Thanks for working on that @wphicks

@wphicks wphicks added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Nov 16, 2020
@dantegd dantegd merged commit 59497fa into rapidsai:branch-0.17 Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] 2+ cuML preprocessors for single variable in Pipeline
5 participants