-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Exposing model_selection in a similar way to scikit-learn #3329
[REVIEW] Exposing model_selection in a similar way to scikit-learn #3329
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
I believe the issue raised in #3267 is that this feature is exposed in a different place from sklearn. A proper test for resolution of this issue should include the line |
I misunderstood the issue - it should be good now. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some inline comments. Also, was there a particular reason model_selection.py
was renamed and moved into a model_selection
directory as part of this PR? I'm not constitutionally opposed to it, but it seems like an unrelated change that might better be performed as part of a PR adding other model selection features.
@@ -0,0 +1,4 @@ | |||
from ._split import train_test_split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's switch this to an absolute import as recommended by PEP8.
@@ -13,7 +13,6 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
# | |||
from cuml.preprocessing.model_selection import train_test_split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to remove this exposure, we need to go through a deprecation process. This will break at least one of our demos, so I recommend splitting it off into a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be appropriate to replace this with from cuml.model_selection import train_test_split
for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general it's better to have __init__
files import from the base source file rather than going through an additional layer of indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, but is there a way to do this without duplicating code? If we do not move model_selection.py
out from preprocessing
, train_test_split
could be imported from both cuml.preprocessing.model_selection
and cuml.model_selection
which may not be desirable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want it to be importable from both locations until we've gone through a deprecation process. We might start the deprecation process with this PR, though. We'll want a warning if users try to import from the old location, and then after a release cycle, we can eliminate it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also get our examples updated accordingly.
@@ -24,13 +24,15 @@ | |||
PolynomialFeatures as cuPolynomialFeatures, \ | |||
SimpleImputer as cuSimpleImputer, \ | |||
RobustScaler as cuRobustScaler, \ | |||
KBinsDiscretizer as cuKBinsDiscretizer | |||
KBinsDiscretizer as cuKBinsDiscretizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space at the end; our linters should pick it up, but mentioning it since I noticed it.
|
All right! I think that's okay then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you might have accidentally resurrected the old model_selection.py. Let's eliminate the duplicate code and make sure the warnings end up in the proper location.
At first I wasn't sure how to deprecate without duplicating the code which was why I reincluded the old |
Ah, I see what you were getting at, but that shouldn't be necessary. from cuml.model_selection._split import train_test_split
# INSERT DEPRECATION WARNING HERE No need to introduce the |
This makes a lot of sense. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! I think we're almost there. One question and one tweak, and then I think we can move this thing forward. Thanks for sticking with it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks again for the work on this.
Ok to test |
Build errors are unrelated to this PR. Once #3316 is merged, we should merge mainline into this branch and test from there. |
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #3329 +/- ##
===============================================
+ Coverage 71.48% 71.53% +0.04%
===============================================
Files 207 208 +1
Lines 16748 16816 +68
===============================================
+ Hits 11973 12029 +56
- Misses 4775 4787 +12
Continue to review full report at Codecov.
|
Resolving #3267. It seems that model_selection is already properly exposed through
cuml.preprocessing.model_selection
A small test suite fortrain_test_split
is included in this PR to demonstrate that it works as desired.