Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --warn-no-return, and major binder refactoring #1748

Merged
merged 13 commits into from
Oct 18, 2016

Conversation

rwbarton
Copy link
Contributor

No description provided.

@rwbarton
Copy link
Contributor Author

@ecprice, I'm putting this up so you know where I'm planning to go with this in case you intend to work more on the binder in the near future.

What I've done is have a Frame contain the information of, in addition to the types that various expressions have when a certain point in the code is reached, whether that point is reachable at all. This information is combined in update_from_options and this makes the breaking_out logic in if and try obsolete.

In this version I've made the semantics of frame_context too irregular. Maybe it would be better to re-add can_skip after all?

@ecprice
Copy link
Contributor

ecprice commented Jun 26, 2016

Yeah, adding can_skip back seems reasonable.

By the way, your branch has a bug with try/finally; it doesn't detect this:

def foo() -> int:
    try:
        return 0
    finally:
        1 + 'a'

(I detected this by rebasing my error-on-fall branch that warns on unreachable code above this, then noticing mistakes on the mypy codebase.)

@rwbarton
Copy link
Contributor Author

By the way, your branch has a bug with try/finally; it doesn't detect this:

Oh, thanks for the example. I think it's the same issue as a general bug with try statements that I hadn't filed an issue for yet, namely that the initial state after entering the body isn't captured as a possible state by the surrounding "try frame". For example

from typing import Union, List

x = 1 # type: Union[int, List[int]]
x = 1

try:
    x = [1]
finally:
    x[0]

should error on the last line but doesn't. (There could be an asynchronous exception, or you could add a function call that could raise an exception there and it makes no difference to the binder.)

@ecprice
Copy link
Contributor

ecprice commented Jun 27, 2016

Oh, yeah, that's right. Hackishly adding self.binder.allow_jump(-1) as the second line of visit_try_without_finally fixes it.

Why do you have the changed = changed or (old_unreachable != new_unreachable) line?
I've been thinking about how to get while True: working with this interface, and I think it's tricky to do without breaking abstractions; it may be worth considering at the same time as reworking the interface.

@rwbarton
Copy link
Contributor Author

I think I figured out something that had been confusing me here. In update_from_options, we only allow new states that come from options_on_return, which doesn't include the old state of the frame. However, if that list is empty, then nothing happens in options_on_return and so we do end up with the old state of the frame. This happens for example when a frame_context(2) block exits.

This inconsistency didn't cause any problems before unreachable, possibly due to the particular structure of frames that mypy generates. With unreachable of course there is a big difference between being reachable by no frames and being reachable by just the old frame, so I'm going back to adding the old frame to options when can_skip is set.

@ecprice
Copy link
Contributor

ecprice commented Jun 28, 2016

Seems reasonable.

I think the reason it worked before is that the times it wouldn't work are when the list is empty and you're unreachable, and there was separate logic for detecting unreachable states and breaking out.

@rwbarton rwbarton changed the title WIP: Replace the binder's breaking_out by an 'unreachable' property of Frame Replace the binder's breaking_out by an 'unreachable' property of Frame Jun 28, 2016
@rwbarton
Copy link
Contributor Author

@ecprice, what do you think about just handling while True as in the above commit? (Doesn't handle while True specifically yet, but it will when I teach find_isinstance_check about True.)

@ecprice
Copy link
Contributor

ecprice commented Jun 28, 2016

Don't you need to set unreachable() if else_map is None?

@rwbarton
Copy link
Contributor Author

Yes, I do. Since this pattern appears everywhere I'm going to make it into a method.

@ecprice
Copy link
Contributor

ecprice commented Jun 28, 2016

One issue with the assert False handling is that mypy doesn't parse assert False, "message" properly, and can't distinguish it from the buggy assert(False, "message")

@rwbarton rwbarton force-pushed the rwbarton-binder-2 branch 4 times, most recently from b68f747 to 6c7ceb0 Compare June 29, 2016 01:32
@rwbarton
Copy link
Contributor Author

One issue with the assert False handling is that mypy doesn't parse assert False, "message" properly, and can't distinguish it from the buggy assert(False, "message")

Yep... the fast parser does parse it correctly though so I used it in one of the tests.

I think I'm done with this for now, so if you have a chance, could you try rebasing your error-on-fall branch on top if it's not too much trouble? I think it's best if for now we make failing to return just a warning (if the flag is enabled) so that we don't have to update the tests and mypy itself to pass with the flag completely yet. (Some of the return Nones that would need to be added to the checker are not great for making mypy check under strict optional checking, anyways.)

@ecprice
Copy link
Contributor

ecprice commented Jun 29, 2016

OK, I've updated error-on-fall to give a note for missing return statements. The first commit is the real one; the later ones update the tests/mypy return statements/add a hack for assert False, "message" so that there aren't too many false positives.

I haven't rebased the warning for unreachable lines yet, because my old version wasn't quite right -- a decent approximation isn't too hard, but it gets tricky when you have code paths that are checked multiple times, to only warn on the lines that are unreachable under all paths.

@ecprice
Copy link
Contributor

ecprice commented Jun 29, 2016

Oh, and binder.ConditionalTypeBinder.unreachable should return None, not bool.

@rwbarton
Copy link
Contributor Author

rwbarton commented Jul 1, 2016

I've finished what I wanted to do on this branch.

The last two commits above remove DeletedType and replace it by a status that the type binder tracks. I like this because DeletedType is not really a type; it makes no sense to have a list of DeletedType or a function that returns DeletedType. It also means the logic around deleted types is concentrated in the binder, rather than popping up in every operation that visits types. (One could also imagine making some aspects of how the binder treats deleted variables configurable; a major pain if you have to pass flags into is_subtype.)

The code ended up a bit trickier than I would like, due to things like deleting exception handler variables and the fact that definitions that are assignments to a more specific type than the declared type don't set the initial state of the variable in the type binder. (Should this be fixed? Anyways, it would be too much for this PR.) It could probably use a future clean-up.

Overall I'm a lot happier with the new state of the binder! Thanks again to @ecprice for helping out with this little project. Particularly helpful were all the test cases!

@gvanrossum
Copy link
Member

What would it take (besides a rebase) to get this ready for merging? I would really like to use this option.

@gvanrossum gvanrossum changed the title Replace the binder's breaking_out by an 'unreachable' property of Frame [conflict] [WIP] Replace the binder's breaking_out by an 'unreachable' property of Frame Sep 29, 2016
@rwbarton rwbarton changed the title [conflict] [WIP] Replace the binder's breaking_out by an 'unreachable' property of Frame Add --warn-no-return, and major binder refactoring Oct 7, 2016
@rwbarton
Copy link
Contributor Author

rwbarton commented Oct 7, 2016

I managed to rebase this on master, so if one of you has a chance to review it that would be great. Hopefully I can even remember how it works :)

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 18, 2016

I'm looking at this.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 18, 2016

Looks good, and I especially like how this makes binder-related code clearer and more readable. Thanks for working on this and continuing to update this -- finding missing returns has been a long-time wishlist item. When run against mypy, this seems to find some real bugs!

There are some test cases that I'd like to see added, but they don't block this PR.

@JukkaL JukkaL merged commit 3b92e79 into python:master Oct 18, 2016
@gvanrossum
Copy link
Member

I am suspecting that this PR broke something. I am seeing some new errors for out internal "S" and "C" builds (both Python 2). Several are new errors mentioning basestring, e.g. Argument 1 to "append" of "list" has incompatible type "basestring"; expected "unicode". Here's an example of something that passed before and is now deemed an error:

def f(a):
  # type: (unicode) -> None
  if not isinstance(a, basestring):
    a = ""
  z = []  # type: List[unicode]
  z.append(a)

The error with the latest commit is:

__tmp__.py: note: In function "f":
__tmp__.py:6: error: Argument 1 to "append" of "list" has incompatible type "basestring"; expected "unicode"

There's also another error that appeared at the same time which is even more disturbing.

(file).py:214: error: No overload variant of "type" matches argument types [<Deleted '(exception variable "exc", which we do not accept outside except: blocks even in python 2)'>]
(file).py:214: error: Trying to read deleted variable '(exception variable "exc", which we do not accept outside except: blocks even in python 2)'

That message is old but I've never seen it before and here it is occurring inside an except block.

@rwbarton
Copy link
Contributor Author

I found the cause of the first bug. The logic in conditional_type_map is slightly wrong, and the old checking for if statements happened not to use the wrong information in this situation. Note that this code is a bit strange: unicode is a subtype of basestring, so the body of the if will never execute. I should be able to put together a fix soon.

For the second error, it would really help to have a sanitized/minimized reproducer; this stuff with deleting exception variables was tricky, and I'm not surprised if the behavior changed, but I don't remember the details well.

@gvanrossum
Copy link
Member

Thanks for looking at the first bug. I agree that the reproducer looks odd, but these kinds of patterns really do happen in our code. I'm guessing the type annotation indicates what we hope for and the isinstance() check tries to deal with bad calls from unchecked code.

I'm still trying to come up with an isolated reproducer of the second bug.

@gvanrossum
Copy link
Member

Update: I've had no success yet finding a small reproducer for the deleted variable error. The code that triggers this is remarkably complex. :-( I'll keep looking into it.

@gvanrossum
Copy link
Member

gvanrossum commented Oct 19, 2016

OK, here's a repro. It's still a doozie!

def f() -> None:
    d = D()
    try:
        print()
    except Exception as exc:
        print(exc)
    else:
        try:
            d.e  # UPDATE: no with-statement needed here, just a forward ref
        except Exception as exc:
            print(type(exc))  # ERROR HERE
            print(exc.args)  # ERROR HERE

class D:
    def __init__(self) -> None:
        self.e = E()

class E:
    def __enter__(self):
        pass
    def __exit__(self, *args):
        pass

The errors are:

__tmp__.py: note: In function "f":
__tmp__.py:12: error: No overload variant of "type" matches argument types [<Deleted 'exc'>]
__tmp__.py:13: error: Trying to read deleted variable 'exc'

And yes, it does need two helper classes, and a try/except inside another try/except/else's else clause, and the forward reference d.e. And the function f() must precede the classes.

@gvanrossum
Copy link
Member

Updates: (1) the with-statement wasn't necessary, just the forward ref (I updated the example in place). (2) this is likely due to the deferred_nodes queue. The forward reference d.e causes f() to be processed twice; the first time it's abandoned because of the forward reference. Apparently the abandonment confuses the binder. Hopefully this makes it easier to track down. (You can check whether you're in this "abandoned" state by looking at current_node_deferred.)

@gvanrossum
Copy link
Member

Another update: the forward ref d.e can be anywhere in the function, e.g. right after d = D().

@gvanrossum
Copy link
Member

Much simpler repro:

def f() -> None:
    x  # forward ref
    try:
        pass
    except Exception as exc:
        pass

    try:
        pass
    except Exception as exc:
        print(type(exc))  # ERROR HERE
        print(exc.args)  # ERROR HERE

x = 1 + 1

It's clearly due to deferral, and reusing the caught variable. Since I've just gotten dug into the deferral queue I may or may not be able to come up with a quick fix.

@ecprice
Copy link
Contributor

ecprice commented Oct 19, 2016

Looks like deferral causes mypy to stop inferring types after some point; this is a problem for exception variables, which can change between incompatible types (as well as DeletedType) in different except: branches. The following patch seems to fix your examples:

diff --git a/mypy/checker.py b/mypy/checker.py
index dd26bde..059b668 100644
--- a/mypy/checker.py
+++ b/mypy/checker.py
@@ -1694,7 +1694,12 @@ class TypeChecker(NodeVisitor[Type]):
                                 # To support local variables, we make this a definition line,
                                 # causing assignment to set the variable's type.
                                 s.vars[i].is_def = True
+                                # We also temporarily set current_node_deferred to False to
+                                # make sure the inference happens.
+                                am_deferring = self.current_node_deferred
+                                self.current_node_deferred = False
                                 self.check_assignment(s.vars[i], self.temp_node(t, s.vars[i]))
+                                self.current_node_deferred = am_deferring
                         self.accept(s.handlers[i])
                         if s.vars[i]:
                             # Exception variables are deleted in python 3 but not python 2.

I don't understand deferral well enough to know if there are other problematic interactions, though.

@gvanrossum
Copy link
Member

Oh, that was a helpful hack! The real issue is actually that when current_node_deferred is set, we never assign to var.type. So here's a better (and simpler!) fix:

diff --git a/mypy/checker.py b/mypy/checker.py
index ee0e1b6..04b1a05 100644
--- a/mypy/checker.py
+++ b/mypy/checker.py
@@ -1712,7 +1712,8 @@ class TypeChecker(NodeVisitor[Type]):
                                           'accept outside except: blocks even in '
                                           'python 2)'.format(s.vars[i].name))
                             var = cast(Var, s.vars[i].node)
-                            var.type = DeletedType(source=source)
+                            if not self.current_node_deferred:
+                                var.type = DeletedType(source=source)
                             self.binder.cleanse(s.vars[i])
             if s.else_body:
                 self.accept(s.else_body)

I'll write a test and submit it as a PR.

@ecprice
Copy link
Contributor

ecprice commented Oct 20, 2016

No, that's what I tried to do at first, but it doesn't work. If you check, you'll find that it doesn't fix your first repro of the bug, only the second one.

In your first repro, the first except: clause appears before the forward reference. Hence the type will be set to DeletedType (because current_node_deferred isn't set yet), and then not updated in the except: clause after current_node_deferred is set.

@gvanrossum
Copy link
Member

gvanrossum commented Oct 20, 2016 via email

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 20, 2016

I didn't look into this in detail, but here's another idea (that may not work). What if we created a separate Var node for each except block? Then we could set the type of these independently, even if they have the same name.

@gvanrossum
Copy link
Member

Alternatively, I wonder if we should more closely copy what del exc would do; from visit_del_stmt():

                    self.binder.assign_type(elt,
                                            DeletedType(source=elt.name),
                                            self.binder.get_declaration(elt),
                                            False)

@ecprice
Copy link
Contributor

ecprice commented Oct 20, 2016

Jukka's idea sounds good, however, I'm not sure I understand the motivation for the deferred execution logic in general. It has other issues, for example, the following code gives an error as is but not if the x = 1 + 1 is moved above the function definition.

from typing import *

def f(a: Union[int, str]) -> None:
    x
    a = x
    a + 1   # Error now

x = 1 + 1

Why not instead just evaluate all global statements (including function definitions) before checking all function bodies? It seems dangerous to typecheck functions in two different ways.

Guido, that won't work either: the problem with except clauses isn't really about deletion. One can construct examples where, even if you rip out the deletion code entirely, it still gives the wrong answer because the exception type changes; for example,

def f() -> None:
    try:
        pass
    except D as exc:
        pass
    x
    try:
        pass
    except E as exc:
        exc.e  # Would give an error without my hack, because it still thinks exc is D or deleted.

x = 1 + 1

class D(Exception):
    d = 1

class E(Exception):
    e = 1

@gvanrossum
Copy link
Member

OK, I'll have to try Jukka's idea of using a separate variable for each. Though I suspect it'll run into some other issue... (How would a use of the variable outside the handler clause even be processed?)

I don't know how the design of the deferred nodes queue came about, hopefully Jukka remembers. Perhaps it has to do with assignments to instance variables being processed to set their types for use by other methods? We haven't seen a ton of problems due to it yet (though that may change because I'm looking to reuse it to handle import cycles as well, see #2264).

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 20, 2016

Type checking top-levels before function bodies doesn't resolve some of the more common issues we have, such as accessing an attribute before we've inferred a type for it from an assignment in a method. Example:

class A:
    def f(self, b: 'B') -> int:
        return b.x

class B:
    def g(self) -> None:
        self.x = 1 + 2

I'm not surprised that there are still bugs with deferred nodes, and they are somewhat of a hacky approach. I believe that we can get them to work correctly, though they clearly make it a little harder to reason about type checking. We often type check expressions multiple times anyway, so this mostly affects statements.

To fix the union type example, a non-definition assignment to a in a deferred function could perhaps store Any in the binder to shut up any errors related to a until the deferred processing takes place -- or maybe we could somehow just tell the binder that the type is unknown. Another option might be to silence errors within a deferred function, but this feels more dangerous to me.

gvanrossum added a commit that referenced this pull request Oct 25, 2016
* Fix bug with exception variable reuse in deferred node. The fix is actually by @ecprice.

See #1748 (comment)
@elazarg
Copy link
Contributor

elazarg commented Oct 25, 2016

Sorry for the interruption - is this just another incarnation of the course-grained vs. fine-grained incrementalism?

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 28, 2016

Deferred processing is not currently really related to incremental checking, but fine-grained dependency tracking and incrementalism might be an alternative to deferred checking in the future, though we don't have a design or plan for that yet.

When we are type checking a function and it refers to something that we haven't processed yet (and that doesn't have an inferred type), we basically stop type checking the function and put it in a deferred list. After we've done a single pass on a strongly-connected set of modules, we do another pass and type check the functions in the deferred list again, and this time we generally have the types available from the first pass. (We don't actually stop processing the function when we defer it, but we stop inferring types for variables, since we are missing some type information and can't always infer the right types.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants