Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] make scikit-learn estimator tags compatible with scikit-learn>=1.6.0dev #6651

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

vnherdeiro
Copy link
Contributor

@vnherdeiro vnherdeiro commented Sep 11, 2024

Fixes #6653

Tring to fix latest CI job. Sklearn 1.6.0 dev deprecates BaseEstimator._more_tags_() for __sklearn_tags__

see https://scikit-learn.org/dev/whats_new/v1.6.html and scikit-learn/scikit-learn#29677

@vnherdeiro
Copy link
Contributor Author

Update:

The change introduced in scikit-learn/scikit-learn#29677 makes it hard to subclass a sklearn estimator in a codebase while being compatible with sklearn < 1.6.0 and sklearn >= 1.6.0. Essentially the former looks up ._more_tags() and ignore __sklearn_tags__() while the former looks up __sklearn_tags__() and forbids existence of a
._more_tags() tags method.

The issue is discussed here:
scikit-learn/scikit-learn#29801

and it looks like a relaxation of the impossibility of having both ._more_tags() and __sklearn_tags__() simulatenously will be relaxed. If it goes through let's park this MR until lightgbm decides to force a scikit-learn>=1.6.0 dependency.

@adrinjalali
Copy link

@vnherdeiro note that it's possible already to support both with this method (scikit-learn/scikit-learn#29677 (comment)), however, the version check and @available_if are going to be unnecessary once we merge scikit-learn/scikit-learn#29801

@vnherdeiro
Copy link
Contributor Author

vnherdeiro commented Sep 12, 2024 via email

@jameslamb
Copy link
Collaborator

jameslamb commented Sep 15, 2024

Thanks for starting on this @vnherdeiro . I've documented it in an issue: #6653 (and added that to the PR description).

Note there that I intentionally put the exact errors messages in plain text instead of just referring to _more_tags() ... that helps people to find this work from search engines.

Note also that the _more_tags() thing is only 1 of 3 breaking changes in scikit-learn that lightgbm will have to adjust to to get those tests passing again with scikit-learn==1.6.0.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting on this! Please see scikit-learn/scikit-learn#29801 (comment):

The story becomes "If you want to support multiple scikit-learn versions, define both."

I think we should leave _more_tags() untouched and add __sklearn_tags__(). And have self.__sklearn_tags__() call self._more_tags() to get its data, so we don't define things like _xfail_checks twice.

Do you have time to do that in the next few days? We need to fix this to unblock CI here, so if you don't have time to fix it this week please let me know and I will work on this.

@jameslamb jameslamb changed the title __sklearn_tags__ replacing sklearn's BaseEstimator._more_tags_ [python-package] make scikit-learn tags compatible with scikit-learn>=1.16 Sep 15, 2024
@jameslamb jameslamb changed the title [python-package] make scikit-learn tags compatible with scikit-learn>=1.16 [python-package] make scikit-learn estimator tags compatible with scikit-learn>=1.16 Sep 15, 2024
@vnherdeiro
Copy link
Contributor Author

@jameslamb Have just pushe a sklearn_tags trying a conversion from _more_tags. I added a out of current argument scope warning to catch a change from the arguments in _more_tags (they don't seem to change much though).

@vnherdeiro vnherdeiro changed the title [python-package] make scikit-learn estimator tags compatible with scikit-learn>=1.16 [python-package] make scikit-learn estimator tags compatible with scikit-learn>=1.6.0dev Sep 15, 2024
Copy link

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a maintainer here, but coming from sklearn side. Leaving thoughts hoping it'd help.

python-package/lightgbm/sklearn.py Show resolved Hide resolved
python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

I've reviewed the dataclasses at https://github.com/scikit-learn/scikit-learn/blob/e2ee93156bd3692722a39130c011eea313628690/sklearn/utils/_tags.py and agree with the choices you've made about how to map the dictionary-formatted values from _more_tags() to the dataclass attributes scikit-learn now prefers.

Please see the other comments about simplifying this.

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved
python-package/lightgbm/sklearn.py Show resolved Hide resolved
@vnherdeiro
Copy link
Contributor Author

@jameslamb have adressed your comments! thanks for the review!

Copy link

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably include a test to make sure X_types is exactly as is here, so that when somebody changes it in the future in _more_tags, the corresponding tags in __sklearn_tags__ is also changed (and the test itself)

@jameslamb
Copy link
Collaborator

I'd probably include a test to make sure X_types is exactly as is here, so that when somebody changes it in the future in _more_tags, the corresponding tags in sklearn_tags is also changed (and the test itself)

I started looking into this and realized that LGBMModel._more_tags() is being overwritten in LGBMClassifier / LGBMRegressor / LGBMRanker.

I'll push commits here adding this test and fixing that.

@jameslamb
Copy link
Collaborator

This is proving to be very challenging to get right, because LGBMRegressor / LGBMClassifier have MRO like this:

python -c "import lightgbm; print(lightgbm.LGBMRegressor.__mro__)"
# (<class 'lightgbm.sklearn.LGBMRegressor'>,
# <class 'sklearn.base.RegressorMixin'>,
# <class 'lightgbm.sklearn.LGBMModel'>,
# <class 'sklearn.base.BaseEstimator'>,
# <class 'sklearn.utils._estimator_html_repr._HTMLDocumentationLinkMixin'>,
# <class 'sklearn.utils._metadata_requests._MetadataRequester'>,
# <class 'object'>

(we do that intentionally, following the advice from "BaseEstimator and mixins" at https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator)

I'm finding it difficult to preserve the LightGBM-specific changes that we want (that @vnherdeiro has implemented here) without them being overwritten by the _more_tags() and __sklearn_tags__() coming from sklearn.base.BaseEstimator, and with protection against such methods possibly being added to sklearn.base.RegressorMixin in the future.

Will come back to this tomorrow, when I can, and will try to put together a clear reproducible example. The amount of indirection here means that'll take a bit more time than I have today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ci] [python-package] scikit-learn compatibility tests fail with scikit-learn 1.6.dev0
3 participants