-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Gracefully accept scikit-learn-specific parameters in estimator APIs as pass-throughs #3461
Comments
I suspect any solution would also need to innocuously handle the following scenario, which may call down to the Base class depending on the estimator implementation. clf = Estimator()
clf.set_params(**{"n_jobs": None}) |
This issue has been labeled |
…ghbors Estimator (#4178) This pull request partially solves [[FEA] #3461](#3461). This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through. The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter `.set_params(**{"n_jobs": self.n_jobs})` [1](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/edf6eae2c00f7fa6d76ee381f5b625155061a725/imblearn/over_sampling/_adasyn.py#L112) Authors: - https://github.com/NV-jpt Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #4178
… NearestNeighbors Estimator" (#4267) This pull request partially solves [[FEA] #3461](#3461) This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through. The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter .set_params(**{"n_jobs": self.n_jobs}) The[ original PR ](#4178 address this issue was not sufficient, as [`set_params()`](https://github.com/rapidsai/cuml/blob/067344041b1563b19301e2e69240a56605a67997/python/cuml/common/base.pyx#L248) will still raise a ValueError if "n_jobs" is not returned by [`get_param_names()`](https://github.com/rapidsai/cuml/blob/067344041b1563b19301e2e69240a56605a67997/python/cuml/neighbors/nearest_neighbors.pyx#L453) Authors: - https://github.com/NV-jpt - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) URL: #4267
…ghbors Estimator (rapidsai#4178) This pull request partially solves [[FEA] rapidsai#3461](rapidsai#3461). This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through. The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter `.set_params(**{"n_jobs": self.n_jobs})` [1](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/edf6eae2c00f7fa6d76ee381f5b625155061a725/imblearn/over_sampling/_adasyn.py#L112) Authors: - https://github.com/NV-jpt Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4178
… NearestNeighbors Estimator" (rapidsai#4267) This pull request partially solves [[FEA] rapidsai#3461](rapidsai#3461) This quick-fix has been created to enable cuML's NearestNeighbor estimator to gracefully accept sklearns 'n_jobs' parameter as a pass-through. The purpose of making this quick fix is to allow Imbalanced-Learn samplers to rely on cuML's NearestNeighbor estimator, without producing an error when setting the estimators n_jobs parameter .set_params(**{"n_jobs": self.n_jobs}) The[ original PR ](rapidsai#4178 address this issue was not sufficient, as [`set_params()`](https://github.com/rapidsai/cuml/blob/2fee231ac28d982f64c4a746c25be19750812e81/python/cuml/common/base.pyx#L248) will still raise a ValueError if "n_jobs" is not returned by [`get_param_names()`](https://github.com/rapidsai/cuml/blob/2fee231ac28d982f64c4a746c25be19750812e81/python/cuml/neighbors/nearest_neighbors.pyx#L453) Authors: - https://github.com/NV-jpt - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#4267
Some estimators do not accept all parameters accepted by their corresponding scikit-learn estimator, while others do. For API compatibility, it would be nice to support these arguments to constructors as "pass-through" parameters and perhaps raise a warning or error if they are passed and not None. This additional API compatibility would improve the process of building cuML into downstream libraries and applications.
RandomForestClassifier and Regressor currently take this pass-through approach. cuDF takes a similar pass-through approach to pandas compatibility to enable using some methods with Dask, where these "unsupported" parameters are expected to be None (or the supported default).
Today, Random Forest takes an explicit, hard-coded approach to this task:
cuml/python/cuml/ensemble/randomforest_common.pyx
Lines 75 to 90 in 54ea23c
@dantegd were chatting about this offline. Hard-coding to the current API is one way to do this. Another potential way he suggested to broadly enable this for other estimators might be to wrap this concept into a decorator that inspects the corresponding function signature.
The text was updated successfully, but these errors were encountered: