-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotations guidelines #255
Comments
Thanks for starting this discussion @crusaderky. I'm very much in favour of adding more guidance around good practices for type annotations generally. I'm less excited about having a style guide like this tucked away in the documentation somewhere that we nitpick PRs over. Especially community contributions by folks who don't work on Dask full time. I don't necessarily see the added friction both in terms of contributor time and reviewer time worth the tradeoff. I like the way we enforce Your guidelines above also focus a lot on "what" we should be doing, but not "why" is it beneficial for us to do it. Personally I like type annotations because VSCode gives me more helpful hinting. Other projects like |
Thanks for starting a discussion/guidelines document on this. Another thing I have noticed is that when fully following the To avoid this, I suggest adjusting the docstring guidelines with a rule to avoid typing in docstrings. |
Thanks for starting this discussion @crusaderky, it's useful to air a lot of this out. Most of what you have here is sensible to me, and I'd be happy to have it be made quasi-official. But I also agree with @jacobtomlinson that having these stylistic/linting guidelines be enforced with CI tools is far superior to nit-picking PR-by-PR. Unfortunately, I'm not aware of any configuration options for mypy that really address most of the stuff here. So I think I'd be in favor of making something like this a recommendation rather than a requirement. A few specific thoughts:
This is a really nice rule that is hard to enforce (a lot of contributors don't know about the reasoning for this, and digressing into type variance in a PR review is a tough ask). It would be super valuable if a tool like mypy could suggest "hey, it looks like you aren't mutating this parameter, consider using
This was surprising to me, and I disagree with it. I see that this is listed in the typeshed style guide, which itself defers to this issue, which I don't find particularly convincing. The argument seems to be something like "it's annoying to check for different return types", which is true! But to me that reflects either (1) an underlying problem with the API, or (2) something that should be checked. To look at a special case: a huge fraction of bugs in a generic Python project (and I don't think dask is an exception) come from not checking for optional
This is similar to the above: I sort of agree with this, but with the caveat that a lot of overloads probably meant we wrote a confusing API. I wonder if there is a distinction to be made between annotating old code and annotating new code. If we have some old code that would involve a lot of overloads, for instance, it makes sense to me that we avoid that. But if new code would involve a lot of overloads, I think it would make sense to take a step back and reconsider the signature of a function. |
I'd like to +1 the idea of making CI enforcement the one and only requirement. That is, if things are passing in mypy then that particular patch is pretty much good to go, but small recommendations could be useful in PR discussion if clear/easy to address. The "return types should never be unions" topic is a perfect example IMO. It's not configurable with mypy and things like this exist in the codebase (e.g. Chiming in on this comment from @jacobtomlinson:
I think good type annotations are great developer documentation. I really liked Łukasz Langa's talk from PyCon this year, where inline docs is something he specifically mentions. Everything else in the talk is great too |
Good point. I updated the opening post.
I agree - we should not reject a PR based on subtle transgressions to these guidelines.
A more consistent code style across the project - one that is shared with typeshed and hopefully more projects - makes it easier for newcomers to read the code.
I just enabled
in dask/distributed and I plan to do the same in dask/dask.
I agree.
I agree. I've added a paragraph above stating that these guidelines should not slow down review.
Not that I'm aware of I'm afraid.
It's 99% of the times (1). An egregious case we have in house is distributed.Client. Most of its public methods return I've added a paragraph stating that, when writing new API, you should make an effort to design it in a way that avoids unions in the return type. |
Could you clarify |
Could you articulate? I've never seen NoneType used in annotations |
In some cases, you can avoid mentioning either by using In [1]: from types import NoneType
In [2]: type(None)
Out[2]: NoneType
In [3]: isinstance(None, type)
Out[3]: False
In [4]: isinstance(NoneType, type)
Out[4]: True
In [5]: isinstance(None, NoneType)
Out[5]: True
In [6]: isinstance(None, None)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 isinstance(None, None)
TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union
In [7]: int | None == typing.Optional[int]
Out[7]: True
In [8]: int | NoneType == typing.Optional[int]
Out[8]: True The conservative answer would be to stick with |
I think |
Copying this from dask/dask#9233. I'm happy to help with the typing push. I just had two issues I wanted to discuss: DispatchBasically we have an object containing a dictionary that maps types to functions that apply to those types, e.g. if we pass in a tuple to make_meta_dispatch = Dispatch("make_meta_dispatch") becomes class MakeMetaDispatch(Dispatch):
@overload
def __call__(self, arg: tuple[str, np.dtype], *args, **kwargs) -> pd.Series:
...
@overload
def __call__(self, arg: dict[str, np.dtype], *args, **kwargs) -> pd.DataFrame:
...
def __call__(self, arg, *args, **kwargs):
super().__call__(arg, *args, **kwargs)
make_meta_dispatch = MakeMetaDispatch("make_meta_dispatch") I think this would work, but it's a bit unfortunate because the dispatch system is no longer decentralised like it was before: all the overload types have to be specified at the time of dispatch creation, even if the actual method body does not. I'm not sure this would be acceptable to the designer of this multiple dispatch system in the first place, but maybe it would be an improvement? DocstringsAs far as I can tell, dask is using numpy docstrings, but I don't believe that system supports type annotations. What this effectively means is that you have to repeat yourself when you define the type signature and in the first line of the docstring. I'm not sure there's any easy solution that doesn't just involve using a different docstring style (like Google, which seems to handle this). |
Do I understand correctly that this paragraph is strictly about
The paragraph in the opening comment, Redundant annotations`, addresses this. |
No, it's not just about |
There's been some discordant opinions re. annotations recently, so I'd like to reach a consensus on project-wide guidelines.
The below is a first draft and represents my personal opinion only - please comment if you would like to add/amend.
General guidelines
This means that the style for annotating dask can change over time as new PEPs are created or amended upstream.
__init__
return type is not annotatede.g.
def f() -> int | str
should be replaced withdef f() -> Any
.When designing new API, you should avoid having multiple return types unless you want the user to explicitly check for them; consider instead using TypeVars or breaking your function into two separate ones.
mypy
# type: ignore
only when working around it would be unreasonable.# type: ignore
to work around a bug in mypy, you should add a FIXME with a cross-reference to the mypy issue on github.Specific vs. generic
Specific is better than generic, unless it unreasonably harms readability.
For example, this is bad:
This is good:
However, this is bad:
in the above case, it is better to be more generic for the sake of readability:
Frequently, components of very complex signatures are used repeatedly across a module; you should consider using
TypeAlias
(in the example above, there could be a TypeAlias for the dict keys).You should use
@overload
, but only when the added verbosity is outweighed by the benefit.Parameters and return types
Parameter types should be as abstract as possible - anything that is valid.
Return types should be as concrete as possible (however, do not use unions in return types - as explained above).
Prefer immutable collections in parameters whenever possible.
e.g.
Backporting
You may use annotations from the very latest version of Python, as follows:
The
TYPE_CHECKING
guard is necessary astyping_extensions
is not a runtime dependency.The TODO comment is strongly recommended to make future upgrade work easier.
TYPE_CHECKING
should be used only when actually necessary.Delayed annotations
At the moment of writing, dask supports Python 3.8+ at runtime, but uses Python 3.9+ annotations.
Don't use
Union
,Optional
,Dict
, etc. or quoted annotations unless necessary.Notable exception are
cast
and subclassing (see next paragraph), which are interpreted at runtime:In this case, quoted annotations are preferred to Union etc.
typing vs. collections.abc
Always import everything that's possible from
collections.abc
.Import from
typing
only when an object doesn't exist elsewhere.Again, this requires
from __future__ import annotations
to work in Python 3.8:The only exception is when subclassing a collection. This is the only way to make it work in Python 3.8:
In this case, you should import the
Mapping
class only from typing. Don't import it from collections.abc as well.Class annotations
You should always annotate all instance variables, both public and private, in the class header (C++ style).
Class variables should be explicitly marked with
ClassVar
.mypy implements some intelligence to infer the type of instance variables from the
__init__
method; it should not be relied upon.Don't:
Do:
It is a good idea to keep Sphinx documentation together with annotations.
It is a good idea to use annotations to avoid explicitly declaring slots.
e.g.
In sphinx:
Redundant annotations
Redundancy should be avoided.
Type specifications should also be avoided in Sphinx documentation when redundant with the annotations in the code.
For example:
None
Defaults to None should be explicit, e.g.
x: int | None = None
.You should always prefer
| None
toOptional
or| NoneType
.The text was updated successfully, but these errors were encountered: