Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [docs] docs rendering failing with 'undefined label: 'metadata_routing' #5954

Closed
jameslamb opened this issue Jul 1, 2023 · 7 comments · Fixed by #5956
Closed

[ci] [docs] docs rendering failing with 'undefined label: 'metadata_routing' #5954

jameslamb opened this issue Jul 1, 2023 · 7 comments · Fixed by #5956
Labels

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented Jul 1, 2023

Description

The check-docs CI job is failing with this error that seems to be occurring while rendering the sphinx docs to HTML

Warning, treated as error:
/home/runner/work/LightGBM/LightGBM/python-package/lightgbm/dask.py:docstring of lightgbm.dask.DaskLGBMClassifier:1:undefined label: 'metadata_routing'

(build link)

Reproducible example

I see that in multiple CI builds here, including the latest on master (build link).

I'm able to reproduce this locally in a Python 3.10 environment on macOS, with the following:

mamba install --yes \
    breathe \
    doxygen \
    scikit-learn \
    sphinx \
    sphinx_rtd_theme \
    'scikit-learn>=1.3'

Additional Notes

When downgrading from scikit-learn 1.3.0 to 1.2.2, this error goes away and the docs render correctly. I strongly suspect that it's related to changes in the most recent version of scikit-learn.

@jameslamb
Copy link
Collaborator Author

strongly suspect that it's related to changes in the most recent version of scikit-learn

scikit-learn v1.3.0 was released 2 days ago (link), which coincides with when I first started seeing these CI failures.

I see the following in the release notes (link)

Metadata routing’s related base methods are included in this release. This feature is only available via the enable_metadata_routing feature flag which can be enabled using sklearn.set_config and sklearn.config_context. For now this feature is mostly useful for third party developers to prepare their code base for metadata routing, and we strongly recommend that they also hide it behind the same feature flag, rather than having it enabled by default.

I'm still reading through this to understand what it means and why it'd show up as a breaking change in the documentation generation.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jul 1, 2023

I think I see the problem.

All of lightgbm's scikit-learn estimators inherit from sklearn.base.BaseEstimator. For example:

from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin

class LGBMModel(_LGBMModelBase):

class LGBMClassifier(_LGBMClassifierBase, LGBMModel):

class DaskLGBMClassifier(LGBMClassifier, _DaskLGBMModel):

When we render this project's sphinx docs using sphinx.ext.autodoc, all of the public properties and methods (anything not beginning with a _) get a documentation entry. See, for example, https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html.

Prior to v1.3.0, sklearn.base.BaseEstimator had only two public methods and 0 public attributes... get_params() and set_params(). Both of those are overridden by LightGBM, and the docstrings are overridden as well.

def get_params(self, deep: bool = True) -> Dict[str, Any]:
"""Get parameters for this estimator.

def set_params(self, **params: Any) -> "LGBMModel":
"""Set the parameters of this estimator.

As of v1.3.0, there's now a third public method, BaseEstimator.get_metadata_routing.

from sklearn.base import BaseEstimator
[x for x in dir(BaseEstimator) if not x.startswith("_")]

# ['get_metadata_routing', 'get_params', 'set_params']

Since that isn't overridden by lightgbm, its docstring gets passed directly through to lightgbm's docs-generation process. Unfortunately, that docstring contains a reference to another part of scikit-learn's documentation, which means results in the error reported at the top of this issue.

from sklearn.base import BaseEstimator
BaseEstimator.get_metadata_routing.__doc__

shows:

'Get metadata routing of this object.\n\n        Please check :ref:`User Guide <metadata_routing>` on how the routing\n        mechanism works.\n\n        Returns\n        -------\n        routing : MetadataRequest\n            A :class:`~utils.metadata_routing.MetadataRequest` encapsulating\n            routing information.\n        '

See that <metadata_routing>? That's a documentation link that's assuming you're building alongside scikit-learn's own docs 🙃

It's a reference to this scikit-learn-only :ref: https://github.com/scikit-learn/scikit-learn/blob/b88b53985d9ddf8cec60414934c55a5745e8b958/doc/user_guide.rst#L42


Ok, so what do we do?

I'm going to do the following right now to unblock development in this repo:

  • put up a PR adding get_metadata_routing() to the public API of LGBMModel, with our own docstring putting a ceiling on scikit-learn in docs generation

And then after that, I'll do the following:

  • open a PR in scikit-learn removing that reference to the User Guide, so other projects like xgboost that implement scikit-learn-compatible estimators don't experience this same issue
  • open a feature request here in LightGBM describing the need to add support for this new "metadata routing" concept to this project's scikit-learn estimators

@jameslamb
Copy link
Collaborator Author

cc @trivialfis @hcho3, you might see the same thing the next time xgboost's docs build, since xgboost scikit-learn estimators also inherit from sklearn.base.

https://github.com/dmlc/xgboost/blob/f90771eec64cea7b1cb4e41a31000c8c54fcc6d8/python-package/xgboost/compat.py#L48-L50

Hopefully this write-up can save you some time.

@jameslamb jameslamb changed the title [ci] [docs] docs rendering failing with 'undefined label: 'metadata_routing'' [ci] [docs] docs rendering failing with 'undefined label: 'metadata_routing' Jul 2, 2023
@trivialfis
Copy link

Thank you for the detailed analysis @jameslamb ! I saw this PR: #5956 by @thomasjpfan . XGBoost is using intersphinx as well for making references to related projects like numpy and pandas.

@Jacob-Stevens-Haas
Copy link

Jacob-Stevens-Haas commented Jul 4, 2023

From another repo that inherits BaseEstimator and ran into the same issue, thanks for the research 🎉 !

@jameslamb
Copy link
Collaborator Author

Sure, glad it helped you!

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot removed the blocking label Oct 4, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants