Make it easier to work with circular module dependencies #481

JukkaL · 2014-10-25T20:39:14Z

It's sometimes painful to work around issues related to circular module dependencies. Some examples:

Mypy complains about not being able to infer a type because of a circular dependency.
Mypy type checks okay but at runtime something is undefined because of a circular dependency.
A type annotation introduces a circular module dependency where there was none before.

Not sure how these should be resolved. Some ideas:

Add a flag that dumps information about any circular module dependencies in the target program. This should help with debugging.
Warn about 'from imports' that are part of a circular module dependency, as these could fail at runtime.

JukkaL · 2014-10-25T20:52:15Z

Another idea is improving type inference when there are circular module dependencies. A multi-phase approach might work: if a function fails to type check because it depends on something that has no inferred type yet, don't report the errors but put the function in a work list. After a circular module group has been type checked, type check any functions in the work list again. This can be repeated until the work list is empty, up to a certain maximum number of iterations.

JukkaL · 2016-03-17T19:16:53Z

I tentatively added this to the 0.4.0 milestone, as this could be a big issue as users start annotating code with big dependency cycles as we have at Dropbox.

Some additional things we could do to help this:

Have a stable ordering when type checking modules within import cycles, as different orders may cause mypy to complain about different things (such as not being able to infer a type for something).
Prioritize certain dependencies higher when determining type checking order (discussed at The mother of all pull requests: add --incremental #1292). For example, from m import y could be higher precedence than import m, and if a class in a module subclasses a class defined in another module, make that dependency a high priority one.
Infer types for (most) decorated functions before the final type checking pass. Either do this during semantic analysis (we already do some, but it's pretty limited) or introduce another type checking pass. Decorated functions often cause trouble as mypy sometimes only infers their types during the final pass, when it can be too late as modules must be type checked in some specific, linear order.
Make incremental type checking do something clever when there are cycles to avoid type checking the entire cycle if there is a local change and the cycle depends on a changed file (or the change is within the cycle).
If mypy hasn't inferred the type of something during type checking, give a more useful message such as pointing to the definition of the thing without an inferred type and asking the user to add an annotation. This doesn't help with decorated functions as they can't be annotated, but see below.
Make it possible to annotate the type of a decorated function (the external type). Currently only the type of the innermost function can be annotated, but the decorators can modify the type.

JukkaL · 2016-03-18T17:55:18Z

A specific issue about decorators: #1303

JukkaL · 2016-03-22T21:59:13Z

Here is more detailed idea for import prioritization:

We'd have two levels of module import priorities, high and low.
from m import x (when x is not a module) at module top level (not within a function) has high priority. Also import m or from mm import m has high priority if there is an expression that refers to m at the module top level, or if there is a type reference that contains m that is not in a comment and that is not string literal quoted.
All other imports have low priority. This includes imports within functions and import m where m is only used within a function.
If we detect an import cycle, use only the high priority imports to topo sort modules within the cycle. This approximates runtime import initialization order.

The intuition here is that runtime import initialization order is the "natural" order to type check modules, and it will ensure that base classes are checked before derived classes when they are defined in different modules. This should make many annotations to self.x = x assignments that I've seen in the code of our internal customers at Dropbox unnecessary.

ddfisher · 2016-04-05T21:44:34Z

Some other thoughts:

It's probably worth taking a look at some common import cycle ordering errors to see if we can break down mypy's passes a bit more to fix them.
It could be nice for debugging/understanding problems to have a mode where mypy prints out a list of all import cycles in the code.
I'm not sure how hard this is, but we could try to process imports in the same order as Python itself, which might help significantly.

JukkaL · 2016-10-04T15:48:45Z

Additional thoughts:

We could generalize two-pass type checking within a module to do multiple type checking passes over an import cycle. During each pass, we'd add functions which require types that haven't been inferred yet to a work list. In the next pass we'd only type check functions in the work list. We are finished when the work list is empty after a pass.
If it would be possible to annotate all module and class level definitions (we'd need support for arbitrary callable types with keyword arguments etc., and annotating types of decorated functions), we could perhaps ask users to fully annotate all (non-trivial) definitions within import cycles. This way type inference would be intra-module and it wouldn't matter in which order we process an import cycle.

Failure to infer types for decorated functions is a common problem. A decorator can arbitrarily modify the signature of the decorated function, but we can't declare the signature of the decorated function anywhere, so in order to call a decorated function, we must have first type checked the definition. Example of how this could work, using Python 3.6 variable annotation syntax:

decorated: Callable[[int], str]
@decorator
def decorated(...) -> ...: 
    ...

gvanrossum · 2016-10-12T15:54:12Z

I want to propose a specific design: generalize the "deferred checking" implemented in checker.py to work within a cycle of imports. (I'm not claiming the idea is original, I'm just working out the details.)

How it currently works: During type checking, types are inferred for variables (a concept which includes attributes). When a function contains a variable reference and the type of the variable has not yet been inferred, the function is added to a queue of "deferred nodes" and checking the function is abandoned. Then, when all type checking for a module is done, the functions in the queue are re-checked, under the assumption that all variables have now been inferred, and the re-checks will succeed. If this hypothesis fails, the error "Cannot determine type of 'Thing'" is issued. (There is only one place where that specific error is issued; it is issued immediately when not occurring inside a function or during the last pass.)

The proposal: Lift the processing of the deferred queue out of TypeChecker.visit_file(). That function's logic is roughly:

initialize a bunch of stuff specific to the file;
type-check all definitions in the file;
check the second pass;
do some processing on __all__.

It looks like steps (3) and (4) commute, and then we could refactor things a bit so that the caller (process_stale_sccs() in build.py) has access to the queue and controls when the second pass runs. It then can be changed so that it runs the first pass for all nodes in a cycle, collecting deferred queues for each, and then runs all the second passes. (And possibly third passes, etc.)

How this helps: First, an example of how a forward reference within one file is handled by the current mechanism.

def f() -> int:
    return x
x = 1 + 1

In the first pass, f is checked before x, so when x is analyzed, it has no type yet, and f is deferred. At the end of the first pass, x has a type (inferred from 1 + 1 as int) and f is deferred. In the second pass, f is rechecked, and this time the reference to x poses no problem, so f checks clean.

But what if we have a function that references a variable in another module? And what if through import cycles that other module is checked after the one with f? Let's set it up:

# a.py
import b
def f() -> int:
    return b.x

# b.py
import a
x = 1 + 1

If we run mypy a.py b.py things check clean (mypy detects the import cycle and decides to process b.py before a.py). But if we run mypy b.py a.py the processing order is reversed (a.py goes before b.py) and we get errors:

a.py: note: In function "f":
a.py:3: error: Cannot determine type of 'x'

Now, in this case we might have solved this by better analysis of module dependencies, e.g. by noticing that b doesn't really use anything from a while a uses b.x, concluding that b should be processed before a. But what is there's a similar dependency going in the other direction?

# a.py
import b
def f() -> int:
    return b.x
y = 0 + 0

# b.py
import a
def g() -> int:
    return a.y
x = 1 + 1

Here tweaking the processing order just changes the error message -- we either get a complaint about a.py:3 or about b.py:3. Yet at runtime there is no problem: once both modules are imported (regardless of import order!) calling either a.f() or b.g() produces a valid result.

But I believe that extracting the deferred queues and processing them after the first pass has run for both a and b will produce the correct result regardless of the import order.

UPDATE: I have a prototype.

gvanrossum · 2016-10-13T15:36:37Z

Results from the prototype: Remember how I withdrew #2167 when I found that it disturbed the processing order of some internal codebase so as to introduce new errors? Combining that PR with this prototype made those errors go away. So I'm very optimistic about the combination of these. We need both, because the example in #2015 is not solved by the prototype, because that deferred processing only happens for functions, not at the module level (this makes sense since functions are typically called after a module is imported, while module-level code runs immediately at import).

JukkaL · 2016-10-17T16:35:51Z

I haven't look at the prototype yet, but this all sounds good to me.

One thing to note is that I'm not 100% convinced that the current deferred processing is correct in every scenario (it started as a quick hack). However, we can solve these issues (if any) independently of generalizing deferred processing to import cycles.

gvanrossum · 2016-10-17T18:08:13Z

I have tried it on a fairly large test set (our "C" repo), combined with #2167 (adding a lower priority for imports inside "if MYPY"). The latter by itself caused a disturbance in module processing order that reported a bunch of new errors. Combining that with #2264 (my non-prototype implementation of this idea) silenced those errors. From other (smaller) tests I am confident that we still catch real errors due to import cycles (e.g. two modules importing each other and referencing class variables in each other during module loading). Would you mind reviewing both of those?

JukkaL · 2016-10-17T18:38:07Z

Excellent! I can review both the PRs, likely on Tuesday or Wednesday. I also have #1748, the big binder PR to review this week, which I'll likely do first.

gvanrossum · 2016-10-18T18:54:55Z

Where this doesn't help:

Deferred processing only happens for functions. Therefore cross-module references outside functions (either at the global level or in class bodies outside the methods) still require the modules to be processed in the right order.

This problem also exists at runtime. Mypy does not exactly emulate the import order at runtime, but it uses heuristics to approximate it. These heuristics work by giving import dependencies different priorities based on the shape and position of the import (e.g. "from X import Y" has a higher priority than "import X", and imports inside functions or inside "if MYPY" are even lower. To break a tie the order in which imports were encountered dynamically is used.

I expect that over time we'll refine these heuristics somewhat, but once #2167 is merged I think they are pretty good, and once we have deferred processing of functions the majority of cases will actually be taken care of already.

What if you need more than two passes? It's theoretically possible to construct worst-case examples that need an arbitrary number of passes. The deferred code can be easily extended to support it (it can just iterate until nothing gets taken out of the queue) but those cases are pretty rare and I'd rather not go there yet -- most likely when we see this it's a sign of overly complex code.

astrojuanlu · 2017-09-11T21:38:09Z

A type annotation introduces a circular module dependency where there was none before.

These are still extremely painful... Are they entirely fixed by following the advice in https://www.python.org/dev/peps/pep-0484/#forward-references ?

ethanhs · 2017-09-12T01:11:49Z

@Juanlu001 They are entirely fixed by that, but are still a bit painful. You may be interested in https://www.python.org/dev/peps/pep-0563/ which may make this even easier.

astrojuanlu · 2017-09-12T06:57:48Z

Thanks @ethanhs, will keep an eye on that.

About time, since hammer_vlsi.py was getting really large (2k+ LoC!). In addition, mypy has trouble with import cycles: python/mypy#481

JukkaL added the needs discussion label Oct 25, 2014

JukkaL mentioned this issue Mar 21, 2015

Reference to decorated function above its definition fails #613

Closed

JukkaL mentioned this issue Apr 10, 2015

General support for properties #220

Closed

JukkaL mentioned this issue Nov 27, 2015

Order of methods affects validity of instance variables #1016

Closed

JukkaL mentioned this issue Feb 4, 2016

Cannot determine type Issue with near import loops #1185

Closed

JukkaL added this to the 0.4.0 milestone Mar 17, 2016

JukkaL mentioned this issue Mar 18, 2016

Decorators are sensitive to import cycle ordering #1303

Closed

rwbarton mentioned this issue Apr 17, 2016

superclass information can be lost in import cycles #1394

Closed

JukkaL mentioned this issue May 13, 2016

Sort modules in dependency cycles based on dependencies #1530

Closed

gnprice added the topic-import-cycles label Aug 4, 2016

JukkaL mentioned this issue Sep 21, 2016

Deprioritize imports within a if TYPE_CHECKING: when processing import cycles #2166

Closed

This was referenced Oct 12, 2016

"Cannot determine type" on innocent but import-cycled code #2015

Closed

Bug on circular imports #1984

Closed

Imports from other modules not detected in modules #1935

Closed

gvanrossum self-assigned this Oct 13, 2016

gvanrossum mentioned this issue Oct 17, 2016

Check deferred nodes after all first passes in an import cycle #2264

Merged

JukkaL closed this as completed in #2264 Oct 24, 2016

edwardcwang added a commit to ucb-bar/hammer that referenced this issue Apr 14, 2018

Refactor hammer_vlsi into multiple files

6b6d107

About time, since hammer_vlsi.py was getting really large (2k+ LoC!). In addition, mypy has trouble with import cycles: python/mypy#481

edwardcwang added a commit to ucb-bar/hammer that referenced this issue Apr 21, 2018

Refactor hammer_vlsi into multiple files

733a09c

About time, since hammer_vlsi.py was getting really large (2k+ LoC!). In addition, mypy has trouble with import cycles: python/mypy#481

edwardcwang added a commit to ucb-bar/hammer that referenced this issue Apr 24, 2018

Refactor hammer_vlsi into multiple files

b9398d7

About time, since hammer_vlsi.py was getting really large (2k+ LoC!). In addition, mypy has trouble with import cycles: python/mypy#481

russelldavis mentioned this issue Feb 3, 2019

Simple module with harmless import cycle results in Cannot determine type error #6312

Closed

edwardcwang added a commit to ucb-bar/hammer that referenced this issue Feb 7, 2019

Refactor hammer_vlsi into multiple files

7d89045

About time, since hammer_vlsi.py was getting really large (2k+ LoC!). In addition, mypy has trouble with import cycles: python/mypy#481

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to work with circular module dependencies #481

Make it easier to work with circular module dependencies #481

JukkaL commented Oct 25, 2014

JukkaL commented Oct 25, 2014

JukkaL commented Mar 17, 2016

JukkaL commented Mar 18, 2016

JukkaL commented Mar 22, 2016

ddfisher commented Apr 5, 2016

JukkaL commented Oct 4, 2016

gvanrossum commented Oct 12, 2016 •

edited

Loading

gvanrossum commented Oct 13, 2016

JukkaL commented Oct 17, 2016

gvanrossum commented Oct 17, 2016 via email

JukkaL commented Oct 17, 2016

gvanrossum commented Oct 18, 2016

astrojuanlu commented Sep 11, 2017

ethanhs commented Sep 12, 2017

astrojuanlu commented Sep 12, 2017

Make it easier to work with circular module dependencies #481

Make it easier to work with circular module dependencies #481

Comments

JukkaL commented Oct 25, 2014

JukkaL commented Oct 25, 2014

JukkaL commented Mar 17, 2016

JukkaL commented Mar 18, 2016

JukkaL commented Mar 22, 2016

ddfisher commented Apr 5, 2016

JukkaL commented Oct 4, 2016

gvanrossum commented Oct 12, 2016 • edited Loading

gvanrossum commented Oct 13, 2016

JukkaL commented Oct 17, 2016

gvanrossum commented Oct 17, 2016 via email

JukkaL commented Oct 17, 2016

gvanrossum commented Oct 18, 2016

astrojuanlu commented Sep 11, 2017

ethanhs commented Sep 12, 2017

astrojuanlu commented Sep 12, 2017

gvanrossum commented Oct 12, 2016 •

edited

Loading