Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of DEP for static-ish typechecking for Django #65

Closed
wants to merge 5 commits into from
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
253 changes: 253 additions & 0 deletions draft/0484-type-hints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# DEP 0484: Static type checking for Django

| | |
| --- | --- |
| **DEP:** | 0484 |
| **Author:** | Maksim Kurnikov |
| **Implementation team:** | Maksim Kurnikov |
| **Shepherd:** | Carlton Gibson |
| **Type:** | Feature |
| **Status:** | Draft |
| **Created:** | 2019-10-08 |
| **Last modified:** | 2019-10-08 |

## Abstract

Add mypy (and other type checker) support for Django.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about other type checkers. They are not tested, plugins are mypy specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless someone has very specific input here mypy can be the assumed path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it clear by mentioning this explicitly in the DEP. But I'd like to see a list of alternatives and a one-sentence explanation on why we are not using one of them.



## Specification

I propose to add type hinting support for Django via mypy and PEP484. All at once it's too big of a change, so I want to propose an incremental migration, using both stub files and inline type annotations.

https://www.python.org/dev/peps/pep-0484/#stub-files

Back in a day, there was some friction about gradually improving the type checking coverage of different parts of Python ecosystem. So PEP561 was accepted based on the discussion.

It defines how PEP484-based typecheckers would look for a type annotations information across the different places.

https://www.python.org/dev/peps/pep-0561

Specifically, it defines a "Type Checker Method Resolution Order"
https://www.python.org/dev/peps/pep-0561/#type-checker-module-resolution-order

> 1. Stubs or Python source manually put in the beginning of the path. Type checkers SHOULD provide this to allow the user complete control of which stubs to use, and to patch broken stubs/inline types from packages. In mypy the $MYPYPATH environment variable can be used for this.
> 2. User code - the files the type checker is running on.
> 3. Stub packages - these packages SHOULD supersede any installed inline package. They can be found at foopkg-stubs for package foopkg.
> 4. Inline packages - if there is nothing overriding the installed package, and it opts into type checking, inline types SHOULD be used.
> 5. Typeshed (if used) - Provides the stdlib types and several third party libraries.

What is means for Django, it that we can split type annotations into stub files, and inline annotations. Where there will be a corresponding `.pyi` file, mypy would use that, otherwise fallback to inline type annotations.

There's an existing `django-stubs` package where most of the Django codebase files have a `.pyi` counterpart with type annotations.

https://github.com/typeddjango/django-stubs

It also has some plugin code, which takes care of the dynamic nature of Django models.

It's desirable that this package would be usable alongside the Django type annotations migration.


### Incremental migration path:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include our internal milestones here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was about how does the development of django-stubs itself affects the integration process.

1. Add `py.typed` file inside the Django top-level module, to mark that it has inline annotations.
See https://www.python.org/dev/peps/pep-0561/#packaging-type-information

2. Add `__class_getitem__` implementation for the `QuerySet` class to support generic instantiation.

3. Decide on file-by-file based, whether it's appropriate to have inline type annotation, or have it separate for the sake of readability. For those files, merge annotations from `django-stubs`, removing those files in the library.

4. Adopt `django-stubs` as an official Django library to catch more bugs, push users a bit more towards type annotations and prepare them for a change.

5. Do some work on a `merge-pyi` side to make it complete enough for `django-stubs` and Django. For that, we can react out for mypy folks and work with them.

6. Add stubs checking CI step:
1. Use `merge-pyi` to merge `django-stubs` into the Django codebase.
2. Run `mypy` and report errors.

This would allow us to keep `django-stubs` in sync with Django codebase, and prevent false-positives to happen.

7. Based on gained experience, merge more stubs into the codebase.


## Notes

### Overload clutter

Django is very dynamic, so some functions have a lot of different signatures, which could not be expressed in the codebase and require `@overload` clauses
https://www.python.org/dev/peps/pep-0484/#function-method-overloading

An example would be a `Field` - it should behave different whether it's invoked on model class, or model instance. Class returns `Field` object itself, and instance resolve field into an underlying python object
```python
# _ST - set type
# _GT - get type
# self: _T -> _T allows mypy to extract type of `self` and return it

class Field(Generic[_ST, _GT])
@overload
def __get__(self: _T, instance: None, owner) -> _T: ...
# Model instance access
@overload
def __get__(self, instance: Model, owner) -> _GT: ...
# non-Model instances
@overload
def __get__(self: _T, instance, owner) -> _T: ...
```


### How django-stubs currently implemented

`django-stubs` uses a mix of static analysis provided by mypy, and runtime type inference from Django own introspection facilities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to distinguish between which type-checking features are provided by the stubs and which are provided from the plugin.
Maybe make it clear that the part that uses runtime type inference is the mypy plugin currently located in the django-stubs project, and that most of this section refers to features that the mypy plugin provides beyond what is provided in the stubs.

For example, newly introduced typechecking of `QuerySet.filter` uses Django _meta API to extract possible lookups for every field, to resolve kwargs like `name__iexact`.

### What is currently implemented (and therefore possible)

1. Fields inference.

```python
class User(models.Model):
name = models.CharField()
surname = models.CharField(null=True)

user = User()
user.name # inferred type: str
user.surname # inferred type: Optional[str]

# objects is added to every model
User.objects.get() # inferred type: User
User.objects.filter(unknown=True) # will fail with "no such field"
User.objects.filter(name=True) # will fail with "incompatible types 'bool' and 'str'"
User.objects.filter(name__iexact=True) # will fail with "incompatible types 'bool' and 'str'"
User.objects.filter(name='hello') # passes
User.objects.filter(name__iexact='hello') # passes
```

2. Typechecking for `__init__` and `create()`
```python
class User(models.Model):
name = models.CharField()
User(name=1) # fail
User(unknown=1) # fail
User(name='hello') # pass
```
same for `create()` with different `Optional`ity conditions.


3. RelatedField's support, support for different apps in the RelatedField's to= argument

```python
class User:
pass
class Profile:
user = models.OneToOneField(to=User, related_name='profile')

Profile().user # inferred type 'User'
User().profile # inferred type 'Profile'
```

```python
class CustomProfile:
user = models.ForeignKey(to='some_custom_app.User')
CustomProfile().user # will be correctly inferred as 'some_custom_app.User'
```

4. Support for unannotated third-party base models,
```python
class User(ThirdPartyModel):
pass
```
will be recognized as correct model.

5. `values`, `values_list` support

```python
class User:
name = models.CharField()
surname = models.CharField()
User.objects.values_list('name', 'surname')[0] # will return Tuple[str, str]
```

6. settings support
```python
from django.conf import settings
settings.INSTALLED_APPS # will be inferred as Sequence[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example should be used as I think AppConfigs instances are also allowed in there.

https://github.com/django/django/blob/ed112fadc1cfa400dbee6080bf82fd7536ea4c72/django/apps/registry.py#L88-L89

```

7. `get_user_model()` infers current model class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
7. `get_user_model()` infers current model class
7. `get_user_model()` infers current user model class



### Current issues and limitations of django-stubs

1. Generic parameters of `QuerySet`.

For example, we have a model
```python
class User:
name = models.CharField()
```

1. A simple `QuerySet` which is a result of `User.objects.filter()` returns `QuerySet[User]`.
Copy link

@syastrov syastrov Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is getting a QuerySet[User] an issue or limitation? I could see it as looking more complex to the user, but I think of all of these generic parameters discussed in this section, this is the most intuitive. Also, you can still use QuerySet in your type annotations (without the benefit of constraining the type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't see it as a limitation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. QuerySet by itself as an annotation will disable all model-specific typechecking.
  2. QuerySet[MyModel] is very intuitive, yes, I agree.

Problem here is that in the definition we have to add more than one generic argument. And this leads to the fact that you can't really use QuerySet[MyModel] anymore, but have to specify second one too (there's no default vals for generics in PEP484 AFAIK).

If you won't do it, and instead try to use get_type_analyze_hook to fill missing generic arguments, like we did in the past for the django-stubs, it's going to break all other typecheckers besides mypy.

I think the most intuitive AND future-proof way to use QuerySet with three args: 1) for model, 2) for return type, ( 3) for annotated vars, but this one hasn't been implemented in django-stubs yet), and just document it thoroughly. If users want proper typechecking, they would use two-three-arg notation, else they would just go with QuerySet.

Does it make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another option, to leave only one generic param, as in QuerySet[MyModel] and implement all values(), values_list() in the plugin. Anyway, I'll go with the simplest one, without typechecking for values(), values_list, annotate() for now. They could be added later.


2. When we add `values_list('name')` method to the picture, we need to remember (and encode in the generic params) both the fact that it's a `QuerySet` of the `User` model, and that the return item will be a tuple object of `name`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could argue that this type QuerySet[User, Tuple[str]] just better represents the actual behavior, but I do see how it looks more complicated to the user if it should show up in an error message. Still, you are free to ignore that complexity if you are not using type-checking.

So, it becomes `QuerySet[User, Tuple[str]]`.

3. To implement `.annotate(upper=Upper('name'))` we need to remember all the fields that created from `annotate`, so it becomes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that users' type hints could be a lot simpler in most cases if we encouraged them to use simpler abstractions like Collection[...] rather than QuerySet[..., ..., ...] in their type hints. QuerySet is a complex beast, and unless you need to use all of its bells and whistles, you can usually get along fine with a read-only view of its data. When using annotate, as far as I know it's not possible to make a query that updates data, therefore you could argue that for most use cases (e.g. if you are not building up complex QuerySets across multiple functions, in which case, things ARE complicated and the extra types can help you), a simpler type would suffice. For example:

from typing import TypedDict, Collection

class UserForPrinting(TypedDict):
  upper: str

def print_users(users: Collection[UserForPrinting]):
  for user in users:
    print(user)

print_users(User.objects.annotate(upper=Upper('name')).values('upper'))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using annotate, as far as I know it's not possible to make a query that updates data

It's possible to use annotate(foo=...).update(bar=F('foo'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops :) Thanks for pointing that out.

`QuerySet[User, Tuple[str], TypedDict('upper': str)]`

2. Manager inheritance.

```python
class BaseUser(models.Model):
class Meta:
abstract = True

objects = BaseUserManager()

class User(BaseUser):
objects = UserManager()
```
Mypy will flag those `objects` managers as incompatible as they violate Liskov Substitution principle.

3. Generic parameters for `Field`

```python
class User:
name = models.CharField()
surname = models.CharField(null=True)
```

`name` and `surname` props are recognized by mypy as generic descriptors. Here's the stub for the `Field`

```python
class Field(Generic[_ST, _GT]):
def __set__(self, instance, value: _ST) -> None: ...
# class access
@overload
def __get__(self: _T, instance: None, owner) -> _T: ...
# Model instance access
@overload
def __get__(self, instance: Model, owner) -> _GT: ...
# non-Model instances
@overload
def __get__(self: _T, instance, owner) -> _T: ...

class CharField(Field[_ST, _GT]):
_pyi_private_set_type: Union[str, int, Combinable]
_pyi_private_get_type: str
```

In the plugin `django-stubs` dynamically marks `name` and `surname` as `CharField[Optional[Union[str, int, Combinable]], Optional[str]]`. We cannot use (as far as I know),

```python
class CharField(Field[Union[str, int, Combinable], str]):
Copy link

@syastrov syastrov Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have a solution to this particular problem, which is only possible since the latest mypy version, 0.730 released a few weeks ago (due to python/mypy#7188).
If we overload the __new__ method and use Literal type to match when null is True or False, then we can alter which Generic parameters are passed into Field, when it is constructed.

I will give a complete working example:

file a.pyi:

from typing import (
    Generic,
    Optional,
    TypeVar,
    Union,
    overload,
)

from typing_extensions import Literal

class Combinable: ...

class Model: ...


_T = TypeVar("_T", bound="Field")
_ST = TypeVar("_ST")
_GT = TypeVar("_GT")

class Field(Generic[_ST, _GT]):
    # Here I tried to do the following to avoid duplicating this in each subclass, 
    # but I can't find a way to refer to the correct subclass (rather than Field[_ST, _GT] 
    # it should return _T[_ST, _GT], where _T is the subclass, e.g. CharField) :
    # @overload
    # def __new__(cls, null: Literal[False] = False, *args, **kwargs) -> Field[_ST, _GT]: ...
    # @overload
    # def __new__(cls, null: Literal[True], *args, **kwargs) -> Field[Optional[_ST], Optional[_GT]]: ...

    @overload
    def __get__(self: _T, instance: None, owner) -> _T: ...
    # Model instance access
    @overload
    def __get__(self, instance: Model, owner) -> _GT: ...
    # non-Model instances
    @overload
    def __get__(self: _T, instance, owner) -> _T: ...

CharField_ST = Union[str, int, Combinable] 
CharField_GT = str
class CharField(Field[_ST, _GT]):
    @overload
    def __new__(cls, null: Literal[False] = False, *args, **kwargs) -> CharField[CharField_ST, CharField_GT]: ...
    @overload
    def __new__(cls, null: Literal[True], *args, **kwargs) -> CharField[Optional[CharField_ST], Optional[CharField_GT]]: ...

file b.py:

from a import CharField, Model

class MyModel(Model):
    f1 = CharField()
    f2 = CharField(null=True)
    f3 = CharField(null=False)

reveal_type(CharField())
reveal_type(CharField(null=True))
reveal_type(CharField(null=False))
reveal_type(MyModel().f1)
reveal_type(MyModel().f2)
reveal_type(MyModel().f3)

Then, you can see that mypy type-checks applies the correct types!

$ mypy b.py
b.py:8: note: Revealed type is 'a.CharField[Union[builtins.str, builtins.int, a.Combinable], builtins.str]'
b.py:9: note: Revealed type is 'a.CharField[Union[builtins.str, builtins.int, a.Combinable, None], Union[builtins.str, None]]'
b.py:10: note: Revealed type is 'a.CharField[Union[builtins.str, builtins.int, a.Combinable], builtins.str]'
b.py:11: note: Revealed type is 'builtins.str*'
b.py:12: note: Revealed type is 'Union[builtins.str, None]'
b.py:13: note: Revealed type is 'builtins.str*'

pass
```
because then we won't be able to change generic params for `CharField` dynamically.

And it also creates a UX issue, as `Field` has two generic params which makes zero sense semantically.

4. `BaseManager.from_queryset()`, `QuerySet.as_manager()`

Not implementable as of now, see
https://github.com/python/mypy/issues/2813
https://github.com/python/mypy/issues/7266