[REVIEW] Exposing model_selection in a similar way to scikit-learn #3329

ptartan21 · 2020-12-28T23:33:38Z

Resolving #3267. It seems that model_selection is already properly exposed through cuml.preprocessing.model_selection A small test suite for train_test_split is included in this PR to demonstrate that it works as desired.

GPUtester · 2020-12-28T23:33:39Z

Can one of the admins verify this patch?

GPUtester · 2020-12-28T23:33:39Z

Can one of the admins verify this patch?

wphicks · 2020-12-29T16:08:44Z

I believe the issue raised in #3267 is that this feature is exposed in a different place from sklearn. A proper test for resolution of this issue should include the line from cuml.model_selection import train_test_split.

ptartan21 · 2020-12-29T17:14:50Z

I believe the issue raised in #3267 is that this feature is exposed in a different place from sklearn. A proper test for resolution of this issue should include the line from cuml.model_selection import train_test_split.

I misunderstood the issue - it should be good now. Thanks!

wphicks

Left some inline comments. Also, was there a particular reason model_selection.py was renamed and moved into a model_selection directory as part of this PR? I'm not constitutionally opposed to it, but it seems like an unrelated change that might better be performed as part of a PR adding other model selection features.

wphicks · 2020-12-29T17:24:02Z

python/cuml/model_selection/__init__.py

@@ -0,0 +1,4 @@
+from ._split import train_test_split


Let's switch this to an absolute import as recommended by PEP8.

wphicks · 2020-12-29T17:31:55Z

python/cuml/preprocessing/__init__.py

@@ -13,7 +13,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-from cuml.preprocessing.model_selection import train_test_split


If we want to remove this exposure, we need to go through a deprecation process. This will break at least one of our demos, so I recommend splitting it off into a separate PR.

would it be appropriate to replace this with from cuml.model_selection import train_test_split for now?

I think in general it's better to have __init__ files import from the base source file rather than going through an additional layer of indirection.

that makes sense, but is there a way to do this without duplicating code? If we do not move model_selection.py out from preprocessing, train_test_split could be imported from both cuml.preprocessing.model_selection and cuml.model_selection which may not be desirable

We'll want it to be importable from both locations until we've gone through a deprecation process. We might start the deprecation process with this PR, though. We'll want a warning if users try to import from the old location, and then after a release cycle, we can eliminate it entirely.

We should also get our examples updated accordingly.

wphicks · 2020-12-29T17:32:20Z

python/cuml/test/test_preprocessing.py

@@ -24,13 +24,15 @@
    PolynomialFeatures as cuPolynomialFeatures, \
    SimpleImputer as cuSimpleImputer, \
    RobustScaler as cuRobustScaler, \
-    KBinsDiscretizer as cuKBinsDiscretizer
+    KBinsDiscretizer as cuKBinsDiscretizer 


Extra space at the end; our linters should pick it up, but mentioning it since I noticed it.

ptartan21 · 2020-12-29T18:00:21Z

Left some inline comments. Also, was there a particular reason model_selection.py was renamed and moved into a model_selection directory as part of this PR? I'm not constitutionally opposed to it, but it seems like an unrelated change that might better be performed as part of a PR adding other model selection features.

model_selection.py was moved to model_selection and renamed in order to mimic scikit-learn's API/module structure (https://github.com/scikit-learn/scikit-learn/tree/0.24.X/sklearn/model_selection) for train_test_split and other data splitting functions that could be added in the future.

wphicks · 2020-12-29T18:10:35Z

All right! I think that's okay then.

wphicks

Looks like you might have accidentally resurrected the old model_selection.py. Let's eliminate the duplicate code and make sure the warnings end up in the proper location.

ptartan21 · 2020-12-30T23:59:51Z

Looks like you might have accidentally resurrected the old model_selection.py. Let's eliminate the duplicate code and make sure the warnings end up in the proper location.

At first I wasn't sure how to deprecate without duplicating the code which was why I reincluded the old model_selection.py. The solution that I have now is to just raise a warning and call from cuml.model_selection

wphicks · 2021-01-01T22:50:34Z

At first I wasn't sure how to deprecate without duplicating the code which was why I reincluded the old model_selection.py. The solution that I have now is to just raise a warning and call from cuml.model_selection

Ah, I see what you were getting at, but that shouldn't be necessary. model_selection.py can now just be:

from cuml.model_selection._split import train_test_split
# INSERT DEPRECATION WARNING HERE

No need to introduce the _new_* functions, and having them there will make the deprecation process more annoying than simply removing model_selection.py in a later commit. I guess technically to be absolutely certain we don't break anything, we should import all symbols formerly exposed in model_selection.py, but train_test_split is the most important one.

ptartan21 · 2021-01-03T06:01:55Z

At first I wasn't sure how to deprecate without duplicating the code which was why I reincluded the old model_selection.py. The solution that I have now is to just raise a warning and call from cuml.model_selection

Ah, I see what you were getting at, but that shouldn't be necessary. model_selection.py can now just be:
from cuml.model_selection._split import train_test_split
# INSERT DEPRECATION WARNING HERE
No need to introduce the _new_* functions, and having them there will make the deprecation process more annoying than simply removing model_selection.py in a later commit. I guess technically to be absolutely certain we don't break anything, we should import all symbols formerly exposed in model_selection.py, but train_test_split is the most important one.

This makes a lot of sense. Thank you!

wphicks

Looking great! I think we're almost there. One question and one tweak, and then I think we can move this thing forward. Thanks for sticking with it!

python/cuml/model_selection/__init__.py

python/cuml/preprocessing/model_selection.py

…odel_selection

wphicks

LGTM! Thanks again for the work on this.

JohnZed · 2021-01-05T21:35:47Z

Ok to test

wphicks · 2021-01-06T15:41:23Z

Build errors are unrelated to this PR. Once #3316 is merged, we should merge mainline into this branch and test from there.

codecov-io · 2021-01-11T21:53:46Z

Codecov Report

Merging #3329 (9167de9) into branch-0.18 (550121b) will increase coverage by 0.04%.
The diff coverage is 84.34%.

@@               Coverage Diff               @@
##           branch-0.18    #3329      +/-   ##
===============================================
+ Coverage        71.48%   71.53%   +0.04%     
===============================================
  Files              207      208       +1     
  Lines            16748    16816      +68     
===============================================
+ Hits             11973    12029      +56     
- Misses            4775     4787      +12

Impacted Files	Coverage Δ
python/cuml/decomposition/incremental_pca.py	`94.70% <ø> (ø)`
python/cuml/dask/ensemble/base.py	`19.69% <30.43%> (+0.36%)`	⬆️
python/cuml/ensemble/randomforestregressor.pyx	`70.83% <44.44%> (ø)`
...ython/cuml/dask/ensemble/randomforestclassifier.py	`30.00% <50.00%> (+0.51%)`	⬆️
python/cuml/dask/ensemble/randomforestregressor.py	`35.08% <50.00%> (+0.54%)`	⬆️
python/cuml/fil/fil.pyx	`91.87% <60.00%> (-1.88%)`	⬇️
python/cuml/ensemble/randomforestclassifier.pyx	`73.72% <66.66%> (ø)`
python/cuml/model_selection/_split.py	`90.35% <90.35%> (ø)`
python/cuml/manifold/t_sne.pyx	`79.42% <98.30%> (+3.34%)`	⬆️
python/cuml/__init__.py	`100.00% <100.00%> (ø)`
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ad71f0...9167de9. Read the comment docs.

ptartan21 added 2 commits December 28, 2020 23:24

model_selection train_test_split exposure test

4cfffb5

style fix

8905c74

ptartan21 requested a review from a team as a code owner December 28, 2020 23:33

style fix

8fb7a5d

exposed model_selection in a similar way to scikit-learn

350f372

ptartan21 added 2 commits December 29, 2020 17:32

removed redundant test cases for train_test_split

45a31bd

clean

4678dee

ptartan21 changed the title ~~[REVIEW] Test model_selection exposure~~ [REVIEW] Exposing model_selection in a similar way to scikit-learn Dec 29, 2020

wphicks requested changes Dec 29, 2020

View reviewed changes

deprecation warnings for cuml.preprocessing.model_selection

4154afd

ptartan21 requested a review from wphicks December 30, 2020 20:41

wphicks requested changes Dec 30, 2020

View reviewed changes

deprecation without code duplication

77d303d

ptartan21 requested a review from wphicks December 31, 2020 00:01

modify import to simplify future deprecation

c7c1831

wphicks requested changes Jan 4, 2021

View reviewed changes

python/cuml/model_selection/__init__.py Outdated Show resolved Hide resolved

python/cuml/preprocessing/model_selection.py Outdated Show resolved Hide resolved

changed exposure of _stratify_split and _approximate_mode from cuml.m…

5107ce0

…odel_selection

wphicks approved these changes Jan 4, 2021

View reviewed changes

wphicks linked an issue Jan 4, 2021 that may be closed by this pull request

[FEA] Expose model_selection in a similar way to Scikit-learn #3267

Closed

wphicks added feature request New feature or request non-breaking Non-breaking change labels Jan 4, 2021

wphicks added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jan 4, 2021

style fix

1af3ba8

Merge branch 'branch-0.18' into fea-expose-model-selection

9167de9

JohnZed approved these changes Jan 13, 2021

View reviewed changes

JohnZed added the 6 - Okay to Auto-Merge label Jan 13, 2021

rapids-bot bot merged commit ecd508c into rapidsai:branch-0.18 Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Exposing model_selection in a similar way to scikit-learn #3329

[REVIEW] Exposing model_selection in a similar way to scikit-learn #3329

ptartan21 commented Dec 28, 2020

GPUtester commented Dec 28, 2020

GPUtester commented Dec 28, 2020

wphicks commented Dec 29, 2020

ptartan21 commented Dec 29, 2020

wphicks left a comment

wphicks Dec 29, 2020

wphicks Dec 29, 2020

ptartan21 Dec 29, 2020

wphicks Dec 29, 2020

ptartan21 Dec 29, 2020

wphicks Dec 29, 2020

wphicks Dec 29, 2020

wphicks Dec 29, 2020

ptartan21 commented Dec 29, 2020

wphicks commented Dec 29, 2020

wphicks left a comment

ptartan21 commented Dec 30, 2020

wphicks commented Jan 1, 2021

ptartan21 commented Jan 3, 2021

wphicks left a comment

wphicks left a comment

JohnZed commented Jan 5, 2021

wphicks commented Jan 6, 2021

codecov-io commented Jan 11, 2021

[REVIEW] Exposing model_selection in a similar way to scikit-learn #3329

[REVIEW] Exposing model_selection in a similar way to scikit-learn #3329

Conversation

ptartan21 commented Dec 28, 2020

GPUtester commented Dec 28, 2020

GPUtester commented Dec 28, 2020

wphicks commented Dec 29, 2020

ptartan21 commented Dec 29, 2020

wphicks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptartan21 commented Dec 29, 2020

wphicks commented Dec 29, 2020

wphicks left a comment

Choose a reason for hiding this comment

ptartan21 commented Dec 30, 2020

wphicks commented Jan 1, 2021

ptartan21 commented Jan 3, 2021

wphicks left a comment

Choose a reason for hiding this comment

wphicks left a comment

Choose a reason for hiding this comment

JohnZed commented Jan 5, 2021

wphicks commented Jan 6, 2021

codecov-io commented Jan 11, 2021

Codecov Report