-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Type Hints for Python Package #7742
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, please let me know if you need any help
python-package/xgboost/data.py
Outdated
missing, | ||
nthread, | ||
data: DataType, | ||
missing: float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float is actually not accurate, we can also take np.float32 and related types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using FloatCompatible = Union[float, np.float32, np.float64]
(do we also need to add np.float16
?)
Sorry I wasn't able to follow up recently... Can I ask what |
python-package/xgboost/_typing.py
Outdated
DataType = Any | ||
|
||
# xgboost accepts some other possible types in practice due to historical reason, which is | ||
# lesser tested. For now we encourage users to pass a simple list of string. | ||
FeatureNames = Optional[List[str]] | ||
FeatureInfo = Optional[Sequence[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change to Sequence[str]
here, and add Optional
in the signatures when needed? Some functions require feature_names
to be not None, for example, https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L2562-L2563
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mypy suggests using Sequence
for function input and more specific type like List
and Tuple
as function return types. https://mypy.readthedocs.io/en/stable/generics.html?highlight=variance#variance-of-generic-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we're talking about the same thing... I meant to use FeatureInfo = Sequence[str]
here, and when referring to it, use Optional[FeatureInfo]
I've also drafted a commit to better illustrate my idea: bridgream@5e3d4af
This isn't something major, though...
No worries, mypy needs to mature and we are not rushing into it. cat code is the encoded categorical feature from cudf. Let's just mark it as a list for now. |
Hi, @bridgream could you please resolve the conflicts? |
Hi @trivialfis , could you please restart the Travis CI job? It looks the job failed at random. |
Please ignore it for now. It's SSH policy is changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we are a lot closer to fully integrated typing. ;-)
@@ -1603,7 +1609,7 @@ def attr(self, key: str) -> Optional[str]: | |||
return py_str(ret.value) | |||
return None | |||
|
|||
def attributes(self) -> Dict[str, str]: | |||
def attributes(self) -> Dict[str, Optional[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dict[str, str] should be correct. Have you found a case where only the key is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the case is right above this function at line 1591:
def attr(self, key: str) -> Optional[str]:
attr(key)
is called in attributes()
at line 1626:
return {n: self.attr(n) for n in attr_names}
@@ -132,7 +140,7 @@ def get_config(): | |||
set_config: Set global XGBoost configuration | |||
get_config: Get current values of the global configuration | |||
""") | |||
def config_context(**new_config): | |||
def config_context(**new_config: Any) -> Iterator[None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a more general generator or an iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, it seems that Generator
is more proper than Iterator
, because config_context
is implemented as a Generator
here. However, as config_context
is essentially a context manager, it's not wrong to annotate the return type as an Iterator
. In fact, Iterator
is more general than Generator
, as Generator
is a subclass of Iterator
:
assert issubclass(Generator, Iterator)
For mypy, however, the choice remains controversial as tracked by this open issue: python/typeshed#2772
On a side note, mypy says
The return type of a generator function should be "Generator" or one of its supertypes
when I use a wrong type hint for config_context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, Iterator is more general than Generator,
Thank you for sharing. I need to remind myself about coroutines. In that case Iterator should be fine.
python-package/xgboost/core.py
Outdated
@@ -1347,7 +1353,7 @@ class Booster: | |||
|
|||
def __init__( | |||
self, | |||
params: Optional[Dict] = None, | |||
params: Optional[Union[Dict, List]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have defined an alias in the _typing module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be fixed together with the previous comment, by introducing BoosterParam
.
Rebased master. There are 15 mypy errors remaining but they all look a bit complex to tackle. |
@bridgream Hi, do you think it's ready to be merged? |
Hi @trivialfis , thanks for the commits! Yes, I believe we're ready. |
Great work! |
Following #7707, I plan to resolve most mypy errors across the python package if not all.
We should probably populate compat.py with stubs for even more packages, such as cupy.