Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Gracefully accept scikit-learn-specific parameters in estimator APIs as pass-throughs #3461

Open
beckernick opened this issue Feb 4, 2021 · 2 comments
Labels
Cython / Python Cython or Python issue feature request New feature or request inactive-30d

Comments

@beckernick
Copy link
Member

beckernick commented Feb 4, 2021

Some estimators do not accept all parameters accepted by their corresponding scikit-learn estimator, while others do. For API compatibility, it would be nice to support these arguments to constructors as "pass-through" parameters and perhaps raise a warning or error if they are passed and not None. This additional API compatibility would improve the process of building cuML into downstream libraries and applications.

RandomForestClassifier and Regressor currently take this pass-through approach. cuDF takes a similar pass-through approach to pandas compatibility to enable using some methods with Dask, where these "unsupported" parameters are expected to be None (or the supported default).

Today, Random Forest takes an explicit, hard-coded approach to this task:

sklearn_params = {"criterion": criterion,
"min_weight_fraction_leaf": min_weight_fraction_leaf,
"max_leaf_nodes": max_leaf_nodes,
"min_impurity_split": min_impurity_split,
"oob_score": oob_score, "n_jobs": n_jobs,
"warm_start": warm_start,
"class_weight": class_weight}
for key, vals in sklearn_params.items():
if vals:
raise TypeError(
" The Scikit-learn variable ", key,
" is not supported in cuML,"
" please read the cuML documentation at "
"(https://docs.rapids.ai/api/cuml/nightly/"
"api.html#random-forest) for more information")

@dantegd were chatting about this offline. Hard-coding to the current API is one way to do this. Another potential way he suggested to broadly enable this for other estimators might be to wrap this concept into a decorator that inspects the corresponding function signature.

@beckernick beckernick added feature request New feature or request Cython / Python Cython or Python issue labels Feb 4, 2021
@beckernick
Copy link
Member Author

I suspect any solution would also need to innocuously handle the following scenario, which may call down to the Base class depending on the estimator implementation.

clf = Estimator()
clf.set_params(**{"n_jobs": None})

@github-actions
Copy link

github-actions bot commented Mar 7, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Sep 9, 2021
…ghbors Estimator (#4178)

This pull request partially solves [[FEA] #3461](#3461).

This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through.  

The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter `.set_params(**{"n_jobs": self.n_jobs})` [1](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/edf6eae2c00f7fa6d76ee381f5b625155061a725/imblearn/over_sampling/_adasyn.py#L112)

Authors:
  - https://github.com/NV-jpt

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4178
rapids-bot bot pushed a commit that referenced this issue Nov 15, 2021
… NearestNeighbors Estimator" (#4267)

This pull request partially solves [[FEA] #3461](#3461)

This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through.

The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter .set_params(**{"n_jobs": self.n_jobs})

The[ original PR ](#4178 address this issue was not sufficient, as [`set_params()`](https://github.com/rapidsai/cuml/blob/067344041b1563b19301e2e69240a56605a67997/python/cuml/common/base.pyx#L248) will still raise a ValueError if "n_jobs" is not returned by [`get_param_names()`](https://github.com/rapidsai/cuml/blob/067344041b1563b19301e2e69240a56605a67997/python/cuml/neighbors/nearest_neighbors.pyx#L453)

Authors:
  - https://github.com/NV-jpt
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: #4267
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
…ghbors Estimator (rapidsai#4178)

This pull request partially solves [[FEA] rapidsai#3461](rapidsai#3461).

This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through.  

The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter `.set_params(**{"n_jobs": self.n_jobs})` [1](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/edf6eae2c00f7fa6d76ee381f5b625155061a725/imblearn/over_sampling/_adasyn.py#L112)

Authors:
  - https://github.com/NV-jpt

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4178
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
… NearestNeighbors Estimator" (rapidsai#4267)

This pull request partially solves [[FEA] rapidsai#3461](rapidsai#3461)

This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through.

The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter .set_params(**{"n_jobs": self.n_jobs})

The[ original PR ](rapidsai#4178 address this issue was not sufficient, as [`set_params()`](https://github.com/rapidsai/cuml/blob/2fee231ac28d982f64c4a746c25be19750812e81/python/cuml/common/base.pyx#L248) will still raise a ValueError if "n_jobs" is not returned by [`get_param_names()`](https://github.com/rapidsai/cuml/blob/2fee231ac28d982f64c4a746c25be19750812e81/python/cuml/neighbors/nearest_neighbors.pyx#L453)

Authors:
  - https://github.com/NV-jpt
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue feature request New feature or request inactive-30d
Projects
None yet
Development

No branches or pull requests

1 participant