Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] fix mypy errors about missing annotations and incompatible types #5672

Merged
merged 14 commits into from
Mar 9, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions python-package/lightgbm/callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
_EvalResultDict = Dict[str, Dict[str, List[Any]]]
_EvalResultTuple = Union[
List[_LGBM_BoosterEvalMethodResultType],
List[Tuple[str, str, float, bool, float]]
List[Tuple[str, str, float, bool, float]],
None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't include None in this type hint. That makes it difficult to use in other parts of the code where None isn't a valid value.

]


Expand Down Expand Up @@ -51,6 +52,8 @@ def __init__(self, best_iteration: int, best_score: _EvalResultTuple) -> None:

def _format_eval_result(value: _EvalResultTuple, show_stdv: bool = True) -> str:
"""Format metric string."""
if value is None:
raise ValueError("Wrong metric value")
if len(value) == 4:
return f"{value[0]}'s {value[1]}: {value[2]:g}"
elif len(value) == 5:
Expand Down Expand Up @@ -254,10 +257,10 @@ def __init__(
self._reset_storages()

def _reset_storages(self) -> None:
self.best_score = []
self.best_iter = []
self.best_score_list = []
self.cmp_op = []
self.best_score: List[float] = []
self.best_iter: List[int] = []
self.best_score_list: List[_EvalResultTuple] = []
jameslamb marked this conversation as resolved.
Show resolved Hide resolved
self.cmp_op: List[Callable[[float, float], bool]] = []
self.first_metric = ''

def _gt_delta(self, curr_score: float, best_score: float, delta: float) -> bool:
Expand Down
24 changes: 20 additions & 4 deletions python-package/lightgbm/dask.py
Original file line number Diff line number Diff line change
Expand Up @@ -1174,7 +1174,7 @@ def fit( # type: ignore[override]
eval_init_score: Optional[List[_DaskCollection]] = None,
eval_metric: Optional[_LGBM_ScikitEvalMetricType] = None,
**kwargs: Any
) -> "DaskLGBMClassifier":
) -> "_DaskLGBMModel":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right fix. I'd expect DaskLGBMClassifier.fit() to return a DaskLGBMClassifier (which, by the way, is a child of _DaskLGBMModel).

But I understand why mypy thinks that's not happening... self._lgb_dask_fit() is inherited from parent class _DaskLGBMModel.

Instead of changing the return type hints, can you please try changing the body of these fit() methods so they're like this pseudo-code?

def fit(...) -> "DaskLGBMClassifier":
    self._lgb_dask_fit(...)
    return self

And see if:

  • those mypy errors about the return type go away
  • the unit tests all still pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you've removed these Dask changes from this PR and they aren't included in the other mypy-related PR you opened (#5731).

Are you interested in trying them out? If not, I'll fix these issues related to lightgbm.dask return types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're doing # type: ignore[override] it seemed kind of redundant to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. No worries, I'll attempt this particlar fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #5756.

To add more details on why the presence of # type: ignore[override] doesn't make this concern redundant...that check and these errors about return types are about different things.

The [override] check is saying "hey, you told me this was a child of {ParentClass}, but it has a method with a different signature than {ParentClass}, which means it can't be automatically used anywhere that {ParentClass} is used".

The errors about return types are saying "hey you told me that this fit() method returns an instance of {ChildClass} but it's returning an instance of {ParentClass}". That's a different thing, and important to fix for at least 2 reasons:

  1. {ChildClass} might define additional attributes that {ParentClass} doesn't (e.g. extra configuration captured through the constructor)
  2. People writing code that depends on lightgbm may wish to perform checks like isinstance(model, lgb.DaskLGBMRegressor). Such checks would fail today if called on the result of fit(), since it's returning an internal class lgb._DaskLGBMModel.

"""Docstring is inherited from the lightgbm.LGBMClassifier.fit."""
return self._lgb_dask_fit(
model_factory=LGBMClassifier,
Expand Down Expand Up @@ -1256,13 +1256,29 @@ def predict(
X_SHAP_values_shape="Dask Array of shape = [n_samples, n_features + 1] or shape = [n_samples, (n_features + 1) * n_classes] or (if multi-class and using sparse inputs) a list of ``n_classes`` Dask Arrays of shape = [n_samples, n_features + 1]"
)

def predict_proba(self, X: _DaskMatrixLike, **kwargs: Any) -> dask_Array:
def predict_proba(
jameslamb marked this conversation as resolved.
Show resolved Hide resolved
self,
X: _DaskMatrixLike,
raw_score: bool = False,
start_iteration: int = 0,
num_iteration: Optional[int] = None,
pred_leaf: bool = False,
pred_contrib: bool = False,
validate_features: bool = False,
**kwargs: Any
) -> dask_Array:
"""Docstring is inherited from the lightgbm.LGBMClassifier.predict_proba."""
return _predict(
model=self.to_local(),
data=X,
pred_proba=True,
client=_get_dask_client(self.client),
raw_score=raw_score,
start_iteration=start_iteration,
num_iteration=num_iteration,
pred_leaf=pred_leaf,
pred_contrib=pred_contrib,
validate_features=validate_features,
**kwargs
)

Expand Down Expand Up @@ -1361,7 +1377,7 @@ def fit( # type: ignore[override]
eval_init_score: Optional[List[_DaskVectorLike]] = None,
eval_metric: Optional[_LGBM_ScikitEvalMetricType] = None,
**kwargs: Any
) -> "DaskLGBMRegressor":
) -> "_DaskLGBMModel":
"""Docstring is inherited from the lightgbm.LGBMRegressor.fit."""
return self._lgb_dask_fit(
model_factory=LGBMRegressor,
Expand Down Expand Up @@ -1533,7 +1549,7 @@ def fit( # type: ignore[override]
eval_metric: Optional[_LGBM_ScikitEvalMetricType] = None,
eval_at: Union[List[int], Tuple[int, ...]] = (1, 2, 3, 4, 5),
**kwargs: Any
) -> "DaskLGBMRanker":
) -> "_DaskLGBMModel":
"""Docstring is inherited from the lightgbm.LGBMRanker.fit."""
return self._lgb_dask_fit(
model_factory=LGBMRanker,
Expand Down
4 changes: 2 additions & 2 deletions python-package/lightgbm/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -897,7 +897,7 @@ def n_features_(self) -> int:
return self._n_features

@property
def n_features_in_(self) -> int:
def n_features_in_(self) -> Optional[int]:
jameslamb marked this conversation as resolved.
Show resolved Hide resolved
""":obj:`int`: The number of features of fitted model."""
if not self.__sklearn_is_fitted__():
raise LGBMNotFittedError('No n_features_in found. Need to call fit beforehand.')
Expand All @@ -911,7 +911,7 @@ def best_score_(self) -> _LGBM_BoosterBestScoreType:
return self._best_score

@property
def best_iteration_(self) -> int:
def best_iteration_(self) -> Optional[int]:
""":obj:`int`: The best iteration of fitted model if ``early_stopping()`` callback has been specified."""
if not self.__sklearn_is_fitted__():
raise LGBMNotFittedError('No best_iteration found. Need to call fit with early_stopping callback beforehand.')
Expand Down