-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] simplify subtypes from unions #13401
Conversation
|
51d8d50
to
7eeefad
Compare
29db6b8
to
65732ee
Compare
7eeefad
to
f4e2b7a
Compare
65732ee
to
44ea4ea
Compare
44ea4ea
to
37c695f
Compare
This change makes |
I assume you mean But yes, I agree; it's now EDIT: oh, I'm guessing you actually meant |
If we switch to using a |
I don't think in general this is an operation we will need often, for the same reason -- the question will usually be about subtyping or assignability (or equivalence), not simple type equality. It's possible we will have the specific case of needing to know if Any/Unknown is in the union, but I think if that's an issue we could store an extra boolean flag on every union (and potentially even leave the actual Any/Unknown entry out of it) for less cost than the cost of the |
Yes sorry. I still think it is because we loop over |
Yes, I agree that this is accurate for I was correcting myself about |
Avoid quadratic time in subsumed elements when adding a super-type of existing union elements. Reserve space in advance when adding multiple elements (from another union) to a union. Make union elements a `Box<[Type]>` instead of an `FxOrderSet`; the set doesn't buy much since the rules of union uniqueness are defined in terms of supertype/subtype, not in terms of simple type identity. Move sealed-boolean handling out of a separate `UnionBuilder::simplify` method and into `UnionBuilder::add`; now that `add` is iterating existing elements anyway, this is more efficient. Remove `UnionType::contains`, since it's now `O(n)` and we shouldn't really need it, generally we care about subtype/supertype, not type identity. (Right now it's used for `Type::Unbound`, which shouldn't even be a type.) Add support for `is_subtype_of` for the `object` type. Addresses comments on #13401
} | ||
Type::Never => {} | ||
_ => { | ||
let mut remove = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In equivalent mypy code, I had to add a special fast path for literals. You can do better than quadratic for unions with lots of literals of the same type, which turns out to be a thing in the wild
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for pointing this out! I took a look at that optimization in mypy.
I think the case this would optimize is where a union already contains e.g. str
, and then we try to add lots of string literal types to it, every one of which is redundant because its a subtype of str
. Rather than going through all existing union members to check if each literal is a subtype of any of them, we can keep a hash-set of "types present in this union which have literal forms" and do an O(1) contains check against that set as the first step when adding a literal type to the union. Framed in more general terms, it's identifying that a certain set of common types have a single super-type that is most likely to rule them out of the union, and so we optimize checking for that most likely super-type by identity.
This makes sense; I'd prefer to wait to add this kind of optimization until we see it crop up in a real-world codebase and can evaluate the actual impact of the optimization in our case, but it's definitely a useful idea to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what would be useful is if we added one (or more) benchmarks based on a real-world codebase that makes heavy use of large literals. (I.e., pydantic.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite the right description, it's also useful when the union doesn't contain the supertype (i.e. str). For instance, say if you were combining two unions that you knew consisted only of literal types, you could use a set union, which is linear. The mypy optimisation I added is basically that, but also works when there are non-literal types thrown in as well. Fair enough on waiting though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks, yeah, I misread the code. The set is unduplicated_literal_fallbacks
, not duplicated_literal_fallbacks
. So it looks like it's optimizing only the case you described; the mirror image of the case I described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a new issue collating some of the perf issues mypy and pyright have encountered relating to unions: #13549
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think there are some mypy PRs missing from the list, so if you're interested in code I'd make sure to look at main.
I'll also make it such that if you're interested in real world use cases you should only have to look at primer, looks like there are 1-2 things I never actually added.
Add
Type::is_subtype_of
method, and simplify subtypes out of unions.