Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Union and Intersections #3

Closed
CarliJoy opened this issue Jul 24, 2023 · 21 comments
Closed

Union and Intersections #3

CarliJoy opened this issue Jul 24, 2023 · 21 comments

Comments

@CarliJoy
Copy link
Owner

CarliJoy commented Jul 24, 2023

Intersection A & B

Type: Intersection of A and B. If A and B are both concrete types and not protocols, this must be a subclass of both. If either is a protocol, the protocol must be satisfied. If A or B is already an intersection, the intersections are combined.
Interface: The interface of A and B are both available and must not conflict.

(A & B) & C is equivalent to A & B & C

Union A | B

Type: Union of A and B. If A or B are already Unions, The unions are combined

Interface: The attributes, methods, and properties shared between A and B which have compatible types. If A and B use an identifier in conflicting ways, the accessor of the object cannot know which way it is used without checking.

(A | B) | C is equivalent to A | B C

(The below may seem contrived, but I think including it can assure the definitions we end up on remain consistent)

The order of intersections and unions matters

Unions of Intersections (A & B) | (B & C)

Type: a union of the intersection A & B and the intersection B & C
Interface: The shared compatible interface of 1) the combined interfaces of A and of B which must not conflict with each other and 2) the combined interfaces of B and of C which must not conflict with each other

(A & B) | (B & C) is equivalent to (A | C) & B

Intersection of Unions (A | B) & (B | C)

Type: Intersection of the unions A | B and B | C
Interface: The interface of both the unions A | B and B | C, where each of those unions has the shared minimum nonconflicting interface, and no permutation of union elements results in a conflicting interface for the intersection*

(A | B) & (B | C) is not equivalent to B & (A | C), but to (A & C) | (B & (A | C))

* A contrived case, to be sure, but...

class A(Protocol):
    def this(self: Self) -> Self:
        ...

class B(Protocol):
    def that(self: Self, other: T) -> T:
       ...

class C(Protocol):
    def this(self: Self) -> None:
        ...

Problematic = (A | B) & (B | C)

There are two possibilities here

  1. This reduces to: (A & B) | B | (B & C) due to the incompatibility between A & C
  2. This is considered unsatisfiable as a set of types

I believe 2 to be the correct interpretation. 1 results in a different interface than may be expected.

Why this matters when considering Any

This is a set of ordering and satisfiability constraints that works with the proposed update to the definitions of Any and Never. Should these orderings and constraints be considered undesirable, we may be back to the drawing board on handling Any

Originally posted by @mikeshardmind in #1 (comment)

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Jul 24, 2023

This looks okay, but I don't think "The interface of A and B are both available and must not conflict." is going to be reasonable in general. You can always have two interfaces with functions that have different parameter lists, but an intersection is still possible because the parameter list is contravariant. So a derived type that inherits from the members of the intersection can simply resolve the problem by having overloads (for example) that accomodate all of its parent classes.

Also, it may be really expensive to test that entire interfaces are compatible. It would be a lot easier to wait until a call is performed before verifying the call.

I think the verification should be to consider only the set of any elements that support that call, and then use as a return value the intersection of return types (due to covariance).

@CarliJoy
Copy link
Owner Author

I would be very happy if someone could write the something in the specification
@tomasr8 already offered, so keep him in the loop.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Jul 24, 2023

So a derived type that inherits from the members of the intersection can simply resolve the problem by having overloads (for example) that accomodate all of its parent classes.

overloads do not solve this. A & B says that this must satisfy both A and B. This is not possible in some cases by strict incompatibility. artificially adding overloads cannot bandaid over this suitably, and even detecting where it could theoretically would mean that it is not enough to just be a subclass of both A and B, but that you then must reimplement any interfaces which conflicted to accept the overload. This is an unreasonable ask for type checkers for something that won't even work all of the time. Consider these two interfaces:

class A:
    foo: int

    @staticmethod
    def more_complicated() -> int:
        ...

class B:
    foo: str

    def more_complicated(self, something: int) -> str:
        ...

There is no case for overloads here, and this violates requesting compatible interfaces with both A and with B by the intersection. I can't safely do something.foo and know the type anymore, and I can't even safely attempt to call something.more_complicated to then check the resulting type.

Requesting an impossible intersection should be treated as an error.

@CarliJoy
Copy link
Owner Author

@mikeshardmind this is a different issue, lets discuss this in #5

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Jul 24, 2023

I can't even safely attempt to call something.more_complicated to then check the resulting type.

I was thinking of having a non-empty intersection type if the return types of the function are compatible—even if the parameter lists are totally different:

from typing import overload, Any

class A:
    def more_complicated(self, something: A, cat: float = 1.0, dog: int = 14) -> int:
        ...

class B:
    def more_complicated(self, something: B) -> int:
        ...

class C(A, B):
    @overload
    def more_complicated(self, something: A, cat: float, dog: int) -> int:
        ...

    @overload
    def more_complicated(self, something: B) -> int:
        ...

    def more_complicated(self, something: A | B, cat: float = 0.0, dog: int = 1) -> int:
        return 1

Regardless of how permissive the parameter list is taken to be, I don't think it should be an error to intersect two types that have interface conflicts. It's too easy for this to happen especially when you don't control other people's code. And the intersections can also happen in other people's code, so you would end up with type errors that you can't correct.

I suggest you at least wait for the call or attribute access. For functions, I suggest that you do no more than try the arguments on each function that was defined for an element of the intersection. Don't try to verify that the functions are "compatible" since you may not be able to do so. What if one of the functions is a catch-all like "def f(self, *args, **kwargs)`? That should be fine, in my opinion—the hypothetical derived type could do the same after all. For the return type, choose the intersection of all return types of the matching functions (covariance).

I think one of the things that may be missing from this discussion is that the intersection may not be visible to the code producing the intersection. So there may be no way to provide it. It's very important for valid, reasonable code to pass type checking.

@mikeshardmind
Copy link
Collaborator

This makes an intersection no more useful than what already exists. I'd rather manually write a protocol that describes the intended behavior where there is incompatibility than have a declaration that I expect compatibility between types to exist include the type system not informing me when there isn't, and then inserting an overload that I then need to handle wherever that is used.

Furthermore, in the case of concrete types, it's already invalid to change the signature of a method of a subclass, so how would A & B ever end up with this being needed without prior issues?

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Jul 24, 2023

. I'd rather manually write a protocol that describes the intended behavior where there is incompatibility t

You can't always do this. The intersections can occur in generic code, and you may not be able to engineer some protocol for an intersection that doesn't exist anywhere in your code. And it's extremely tedious to do this in general, and especially in generic code.

than have a declaration that I expect compatibility between types to exist include the type system not informing me when there isn't, and then inserting an overload that I then need to handle wherever that is used.

This is a fair point. I withdraw my most permissive suggestion of imagining broad parameter lists, and suggest the trial approach of trying the parameters on each function with the same name. And then intersecting the returned types. Thus, the above example would work for:

x: A & B
x.f(x)  # Matches A.f, which returns int, and B.f, which returns int.  Thus, it returns int & int = int.

@DiscordLiz
Copy link
Collaborator

DiscordLiz commented Jul 24, 2023

I can't even safely attempt to call something.more_complicated to then check the resulting type.

I was thinking of having a non-empty intersection type if the return types of the function are compatible—even if the parameter lists are totally different:

from typing import overload, Any

class A:
    def more_complicated(self, something: A, cat: float = 1.0, dog: int = 14) -> int:
        ...

class B:
    def more_complicated(self, something: B) -> int:
        ...

class C(A, B):
    @overload
    def more_complicated(self, something: A, cat: float, dog: int) -> int:
        ...

    @overload
    def more_complicated(self, something: B) -> int:
        ...

    def more_complicated(self, something: A | B, cat: float = 0.0, dog: int = 1) -> int:
        return 1

This isn't a valid subclass of A or of B.

main.py:12: error: Signature of "more_complicated" incompatible with supertype "A" [override]

you can't override methods in subclasses like this. (@mikeshardmind also already told you this isn't valid)

The lack of being able to do this makes this impossible, and there are very good reasons not to remove that restriction as it breaks assumptions that sub classes are valid substitutes

It may be possible for protocols by constructing a virtual subclass, but I don't like that for other reasons as I brought up in the other thread

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Jul 25, 2023

This isn't a valid subclass of A or of B.

It is valid. Try it in PyRight; MyPy is wrong here. Apparently, MyPy wants defaults for some reason and then it passes in MyPy too.

Edit: It is valid with defaults only.

@mikeshardmind
Copy link
Collaborator

pyright is wrong about this being a valid subclass of both A and B. The fact that you wrote "MyPy wants defaults for some reason" without understanding the reason is part of the issue here.

Without the defaults, this fails the ability to substitute your subclass for the original and have it be a drop in replacement.

You also constructed a case where overloads can be made to bandage around it and did not address the case I provided where this isn't possible at all.

@erictraut
Copy link
Collaborator

erictraut commented Jul 25, 2023

I'm concerned about the complexity of what's being proposed here and where this discussion has led.

From a type theory standpoint, I think it is sufficient to simply document that unions distribute over intersections and intersections distribute over unions.

(A | B) & C <=> (A & C) | (B & C)
(A & B) | C <=> (A | C) & (B | C)

Other rules should follow from these. TypeScript follows these distributive rules, and I presume other languages that support intersections do as well. (If you haven't done a survey of other languages that support intersections, I'd encourage you to do so. I feel like some of the apparent debates could be avoided if such a survey existed.)

As I mentioned in another thread, I encourage the authors of the PEP to avoid calling out specific handling of protocols, concrete types, etc. The rules for intersections will be easier to understand, more consistent, and more composable with existing type features if the PEP simply describes what happens in terms of existing type rules.

Other rules that are relevant for unions and intersections:
Subtyping
C is a subtype of A | B if C is a subtype of A or B
C is a subtype of A & B if C is a subtype of A and B

Attribute type evaluation
If ab has the type A | B, the type of ab.foo is invalid if foo is missing from either A or B; if foo is present on both, then ab.foo is the union of the types of A.foo and B.foo.
If ab has the type A & B, the type of ab.foo is invalid if foo is missing from both A and B; if foo is present on one, then ab.foo is that type; if it's present on both, then it's the intersection of A.foo and B.foo.

Using these general rules, we can avoid bikeshedding about whether a particular overload should be considered a subtype of a particular function, etc. Those discussions may or may not be worth our time, but they should be separate from the discussion about intersections.

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Jul 25, 2023

pyright is wrong about this being a valid subclass of both A and B. The fact that you wrote "MyPy wants defaults for some reason" without understanding the reason is part of the issue here.

Ah, my mistake. Well, consider the overload to have defaults then.

You also constructed a case where overloads can be made to bandage around it and did not address the case I provided where this isn't possible at all.

Like I said, Python typing doesn't have good support for the descriptor protocol yet. You can provide theoretical subclasses that work around that too.

Using these general rules, we can avoid bikeshedding about whether a particular overload should be considered a subtype of a particular function, etc. Those discussions may or may not be worth our time, but they should be separate from the discussion about intersections.

I agree. My only point is that we shouldn't block intersection just because the parameter lists of various functions are different. Either leave this up to type checkers or specify the most permissive and correct thing.

@CarliJoy
Copy link
Owner Author

I agree. My only point is that we shouldn't block intersection just because the parameter lists of various functions are different. Either leave this up to type checkers or specify the most permissive and correct thing.

@NeilGirdhar what you mean by block? Block writing the PEP/introducing Intersections or block users from using them in this case?

@NeilGirdhar
Copy link
Collaborator

what you mean by block? B

I just mean that I don't want it to be an error to try to use A & B in a case like:

class A:
  def f(self, x: int) -> None: ...
class B:
  def f(self, x: str) -> None: ...

That's what I tried to explain here. Just because the methods are not "compatible", doesn't mean that a subclass of A and B might not be possible. I think forcing the user to use protocols is unreasonable.

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Jul 26, 2023

what you mean by block? B

I just mean that I don't want it to be an error to try to use A & B in a case like:

class A:
  def f(self, x: int) -> None: ...
class B:
  def f(self, x: str) -> None: ...

That's what I tried to explain here. Just because the methods are not "compatible", doesn't mean that a subclass of A and B might not be possible. I think forcing the user to use protocols is unreasonable.

I think what was said here, namely:

Possibly the type of ab["a"] should actually be int & str, not int | str

solves this without mandating anything additional.

If (A & B).foo = A.foo & B.foo, then we may not mandate how something satisfies both, but it doesn't preclude that something could. (The rule needs to be slightly more verbose than this, but this is how it would express for this case)

@randolf-scholz
Copy link

randolf-scholz commented Jul 27, 2023

@NeilGirdhar I'm not sure what you mean by the methods not being compatible. There is no incompatibility when specifying A&B in the example you gave. If you follow #3 (comment), then (A&B).foo would be any subtype of Callable[[int], None] & Callable[[str], None].

Now, function types Callable[[A], B] are contravariant in the input and covariant in the output, i.e. Callable[[C], D] is a subtype of Callable[[A], B] if and only if C≥A and D≤B. So, following the subtyping rules:

Callable[[X], Y] ≤ Callable[[int], None] & Callable[[str], None]
⟺ (X ≥ int and Y ≤ None) and (X ≥ str and Y ≤ None)
⟺ X ≥ int and X ≥ str and Y ≤ None

I.e. X must be a supertype of both int and str. Set-theoretically, we can assign the join, i.e. the least common supertype of int and str, i.e. X = ⋂_{Y: Y≥int and Y≥str} Y. In the actual class hierarchy, this is likely simply object, but note that structurally also int | str works, i.e.

class C(A, B):
    def f(self, x: int | str) -> None: ...

Would be a valid subclass of A & B.

@NeilGirdhar
Copy link
Collaborator

@NeilGirdhar I'm not sure what you mean by the methods not being compatible. There is no incompatibility when specifying A&B in the example you gave. If you follow #3 (comment), then (A&B).foo would be any subtype of Callable[[int], None] & Callable[[str], None].

I know that there's no compatability problem. That's what I was showing. If you look at some of the comments that I was responding to, you can see the implication that parameters lists might matter. What I said in #3 (comment) was that "because the parameter list is contravariant... a derived type that inherits from the members of the intersection can simply resolve the problem by having overloads (for example) that accomodate all of its parent classes." Which appears to be what you've repeated here?

@NeilGirdhar
Copy link
Collaborator

NeilGirdhar commented Aug 4, 2023

I'm trying to organize the issues. If this is a summary of #5, then this issue appears to be a special case of it? After all:

There are two possibilities here

  1. This reduces to: (A & B) | B | (B & C) due to the incompatibility between A & C
  2. This is considered unsatisfiable as a set of types

The first case here is equivalent to impossible intersection reducing to Never? The second is equivalent to it being some kind error?

@mikeshardmind
Copy link
Collaborator

@NeilGirdhar accurate, but the second option was ruled out for the same reason one of the others was in #5, can come up in a place the type checker may not be able to raise it, so eliminating a case in reduction is fine, erroring on it isn't.

Whether reduction needs to happen or not is still being worked through(?)

@NeilGirdhar
Copy link
Collaborator

so eliminating a case in reduction is fine, erroring on it isn't.

Yes, I agree with that.

I'm not trying to express an opinion though. I just want to know if we can close this issue as a special case of #5, and do all the discussion there?

@mikeshardmind
Copy link
Collaborator

I think so now that we got to that being the decision left here, closing this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants