Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to work with circular module dependencies #481

Closed
JukkaL opened this issue Oct 25, 2014 · 15 comments · Fixed by #2264
Closed

Make it easier to work with circular module dependencies #481

JukkaL opened this issue Oct 25, 2014 · 15 comments · Fixed by #2264

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 25, 2014

It's sometimes painful to work around issues related to circular module dependencies. Some examples:

  • Mypy complains about not being able to infer a type because of a circular dependency.
  • Mypy type checks okay but at runtime something is undefined because of a circular dependency.
  • A type annotation introduces a circular module dependency where there was none before.

Not sure how these should be resolved. Some ideas:

  • Add a flag that dumps information about any circular module dependencies in the target program. This should help with debugging.
  • Warn about 'from imports' that are part of a circular module dependency, as these could fail at runtime.
@JukkaL
Copy link
Collaborator Author

JukkaL commented Oct 25, 2014

Another idea is improving type inference when there are circular module dependencies. A multi-phase approach might work: if a function fails to type check because it depends on something that has no inferred type yet, don't report the errors but put the function in a work list. After a circular module group has been type checked, type check any functions in the work list again. This can be repeated until the work list is empty, up to a certain maximum number of iterations.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Mar 17, 2016

I tentatively added this to the 0.4.0 milestone, as this could be a big issue as users start annotating code with big dependency cycles as we have at Dropbox.

Some additional things we could do to help this:

  • Have a stable ordering when type checking modules within import cycles, as different orders may cause mypy to complain about different things (such as not being able to infer a type for something).
  • Prioritize certain dependencies higher when determining type checking order (discussed at The mother of all pull requests: add --incremental #1292). For example, from m import y could be higher precedence than import m, and if a class in a module subclasses a class defined in another module, make that dependency a high priority one.
  • Infer types for (most) decorated functions before the final type checking pass. Either do this during semantic analysis (we already do some, but it's pretty limited) or introduce another type checking pass. Decorated functions often cause trouble as mypy sometimes only infers their types during the final pass, when it can be too late as modules must be type checked in some specific, linear order.
  • Make incremental type checking do something clever when there are cycles to avoid type checking the entire cycle if there is a local change and the cycle depends on a changed file (or the change is within the cycle).
  • If mypy hasn't inferred the type of something during type checking, give a more useful message such as pointing to the definition of the thing without an inferred type and asking the user to add an annotation. This doesn't help with decorated functions as they can't be annotated, but see below.
  • Make it possible to annotate the type of a decorated function (the external type). Currently only the type of the innermost function can be annotated, but the decorators can modify the type.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Mar 18, 2016

A specific issue about decorators: #1303

@JukkaL
Copy link
Collaborator Author

JukkaL commented Mar 22, 2016

Here is more detailed idea for import prioritization:

  • We'd have two levels of module import priorities, high and low.
  • from m import x (when x is not a module) at module top level (not within a function) has high priority. Also import m or from mm import m has high priority if there is an expression that refers to m at the module top level, or if there is a type reference that contains m that is not in a comment and that is not string literal quoted.
  • All other imports have low priority. This includes imports within functions and import m where m is only used within a function.
  • If we detect an import cycle, use only the high priority imports to topo sort modules within the cycle. This approximates runtime import initialization order.

The intuition here is that runtime import initialization order is the "natural" order to type check modules, and it will ensure that base classes are checked before derived classes when they are defined in different modules. This should make many annotations to self.x = x assignments that I've seen in the code of our internal customers at Dropbox unnecessary.

@ddfisher
Copy link
Collaborator

ddfisher commented Apr 5, 2016

Some other thoughts:

  • It's probably worth taking a look at some common import cycle ordering errors to see if we can break down mypy's passes a bit more to fix them.
  • It could be nice for debugging/understanding problems to have a mode where mypy prints out a list of all import cycles in the code.
  • I'm not sure how hard this is, but we could try to process imports in the same order as Python itself, which might help significantly.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Oct 4, 2016

Additional thoughts:

  • We could generalize two-pass type checking within a module to do multiple type checking passes over an import cycle. During each pass, we'd add functions which require types that haven't been inferred yet to a work list. In the next pass we'd only type check functions in the work list. We are finished when the work list is empty after a pass.
  • If it would be possible to annotate all module and class level definitions (we'd need support for arbitrary callable types with keyword arguments etc., and annotating types of decorated functions), we could perhaps ask users to fully annotate all (non-trivial) definitions within import cycles. This way type inference would be intra-module and it wouldn't matter in which order we process an import cycle.

Failure to infer types for decorated functions is a common problem. A decorator can arbitrarily modify the signature of the decorated function, but we can't declare the signature of the decorated function anywhere, so in order to call a decorated function, we must have first type checked the definition. Example of how this could work, using Python 3.6 variable annotation syntax:

decorated: Callable[[int], str]
@decorator
def decorated(...) -> ...: 
    ...

@gvanrossum
Copy link
Member

gvanrossum commented Oct 12, 2016

I want to propose a specific design: generalize the "deferred checking" implemented in checker.py to work within a cycle of imports. (I'm not claiming the idea is original, I'm just working out the details.)

How it currently works: During type checking, types are inferred for variables (a concept which includes attributes). When a function contains a variable reference and the type of the variable has not yet been inferred, the function is added to a queue of "deferred nodes" and checking the function is abandoned. Then, when all type checking for a module is done, the functions in the queue are re-checked, under the assumption that all variables have now been inferred, and the re-checks will succeed. If this hypothesis fails, the error "Cannot determine type of 'Thing'" is issued. (There is only one place where that specific error is issued; it is issued immediately when not occurring inside a function or during the last pass.)

The proposal: Lift the processing of the deferred queue out of TypeChecker.visit_file(). That function's logic is roughly:

  1. initialize a bunch of stuff specific to the file;
  2. type-check all definitions in the file;
  3. check the second pass;
  4. do some processing on __all__.

It looks like steps (3) and (4) commute, and then we could refactor things a bit so that the caller (process_stale_sccs() in build.py) has access to the queue and controls when the second pass runs. It then can be changed so that it runs the first pass for all nodes in a cycle, collecting deferred queues for each, and then runs all the second passes. (And possibly third passes, etc.)

How this helps: First, an example of how a forward reference within one file is handled by the current mechanism.

def f() -> int:
    return x
x = 1 + 1

In the first pass, f is checked before x, so when x is analyzed, it has no type yet, and f is deferred. At the end of the first pass, x has a type (inferred from 1 + 1 as int) and f is deferred. In the second pass, f is rechecked, and this time the reference to x poses no problem, so f checks clean.

But what if we have a function that references a variable in another module? And what if through import cycles that other module is checked after the one with f? Let's set it up:

# a.py
import b
def f() -> int:
    return b.x
# b.py
import a
x = 1 + 1

If we run mypy a.py b.py things check clean (mypy detects the import cycle and decides to process b.py before a.py). But if we run mypy b.py a.py the processing order is reversed (a.py goes before b.py) and we get errors:

a.py: note: In function "f":
a.py:3: error: Cannot determine type of 'x'

Now, in this case we might have solved this by better analysis of module dependencies, e.g. by noticing that b doesn't really use anything from a while a uses b.x, concluding that b should be processed before a. But what is there's a similar dependency going in the other direction?

# a.py
import b
def f() -> int:
    return b.x
y = 0 + 0
# b.py
import a
def g() -> int:
    return a.y
x = 1 + 1

Here tweaking the processing order just changes the error message -- we either get a complaint about a.py:3 or about b.py:3. Yet at runtime there is no problem: once both modules are imported (regardless of import order!) calling either a.f() or b.g() produces a valid result.

But I believe that extracting the deferred queues and processing them after the first pass has run for both a and b will produce the correct result regardless of the import order.

UPDATE: I have a prototype.

@gvanrossum
Copy link
Member

Results from the prototype: Remember how I withdrew #2167 when I found that it disturbed the processing order of some internal codebase so as to introduce new errors? Combining that PR with this prototype made those errors go away. So I'm very optimistic about the combination of these. We need both, because the example in #2015 is not solved by the prototype, because that deferred processing only happens for functions, not at the module level (this makes sense since functions are typically called after a module is imported, while module-level code runs immediately at import).

@gvanrossum gvanrossum self-assigned this Oct 13, 2016
@JukkaL
Copy link
Collaborator Author

JukkaL commented Oct 17, 2016

I haven't look at the prototype yet, but this all sounds good to me.

One thing to note is that I'm not 100% convinced that the current deferred processing is correct in every scenario (it started as a quick hack). However, we can solve these issues (if any) independently of generalizing deferred processing to import cycles.

@gvanrossum
Copy link
Member

gvanrossum commented Oct 17, 2016 via email

@JukkaL
Copy link
Collaborator Author

JukkaL commented Oct 17, 2016

Excellent! I can review both the PRs, likely on Tuesday or Wednesday. I also have #1748, the big binder PR to review this week, which I'll likely do first.

@gvanrossum
Copy link
Member

Where this doesn't help:

Deferred processing only happens for functions. Therefore cross-module references outside functions (either at the global level or in class bodies outside the methods) still require the modules to be processed in the right order.

This problem also exists at runtime. Mypy does not exactly emulate the import order at runtime, but it uses heuristics to approximate it. These heuristics work by giving import dependencies different priorities based on the shape and position of the import (e.g. "from X import Y" has a higher priority than "import X", and imports inside functions or inside "if MYPY" are even lower. To break a tie the order in which imports were encountered dynamically is used.

I expect that over time we'll refine these heuristics somewhat, but once #2167 is merged I think they are pretty good, and once we have deferred processing of functions the majority of cases will actually be taken care of already.

What if you need more than two passes? It's theoretically possible to construct worst-case examples that need an arbitrary number of passes. The deferred code can be easily extended to support it (it can just iterate until nothing gets taken out of the queue) but those cases are pretty rare and I'd rather not go there yet -- most likely when we see this it's a sign of overly complex code.

@astrojuanlu
Copy link
Contributor

A type annotation introduces a circular module dependency where there was none before.

These are still extremely painful... Are they entirely fixed by following the advice in https://www.python.org/dev/peps/pep-0484/#forward-references ?

@ethanhs
Copy link
Collaborator

ethanhs commented Sep 12, 2017

@Juanlu001 They are entirely fixed by that, but are still a bit painful. You may be interested in https://www.python.org/dev/peps/pep-0563/ which may make this even easier.

@astrojuanlu
Copy link
Contributor

Thanks @ethanhs, will keep an eye on that.

edwardcwang added a commit to ucb-bar/hammer that referenced this issue Apr 14, 2018
About time, since hammer_vlsi.py was getting really large (2k+ LoC!).

In addition, mypy has trouble with import cycles:
python/mypy#481
edwardcwang added a commit to ucb-bar/hammer that referenced this issue Apr 21, 2018
About time, since hammer_vlsi.py was getting really large (2k+ LoC!).

In addition, mypy has trouble with import cycles:
python/mypy#481
edwardcwang added a commit to ucb-bar/hammer that referenced this issue Apr 24, 2018
About time, since hammer_vlsi.py was getting really large (2k+ LoC!).

In addition, mypy has trouble with import cycles:
python/mypy#481
edwardcwang added a commit to ucb-bar/hammer that referenced this issue Feb 7, 2019
About time, since hammer_vlsi.py was getting really large (2k+ LoC!).

In addition, mypy has trouble with import cycles:
python/mypy#481
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants