[red-knot] simplify subtypes from unions #13401

carljm · 2024-09-18T23:22:38Z

Add Type::is_subtype_of method, and simplify subtypes out of unions.

github-actions · 2024-09-18T23:36:14Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

crates/red_knot_python_semantic/src/types/builder.rs

MichaReiser · 2024-09-19T11:46:00Z

This change makes UnionBuilder::map O(N^2) but I don't think there's a way to avoid that

carljm · 2024-09-19T14:47:46Z

This change makes UnionBuilder::map O(N^2) but I don't think there's a way to avoid that

I assume you mean UnionBuilder::add? There is no UnionBuilder::map.

But yes, I agree; it's now O(n^2), and I don't think that's avoidable.

EDIT: oh, I'm guessing you actually meant UnionType::map. UnionBuilder::add is not really O(n^2); it is O(n*m) if you add a union to the builder.

AlexWaygood · 2024-09-19T14:50:29Z

If we switch to using a Vec for the elements of a union, that will make tests such as union.contains(Type::Any) O(n), right? Not sure how important that is to consider

carljm · 2024-09-19T14:53:12Z

If we switch to using a Vec for the elements of a union, that will make tests such as union.contains(Type::Any) O(n), right? Not sure how important that is to consider

I don't think in general this is an operation we will need often, for the same reason -- the question will usually be about subtyping or assignability (or equivalence), not simple type equality.

It's possible we will have the specific case of needing to know if Any/Unknown is in the union, but I think if that's an issue we could store an extra boolean flag on every union (and potentially even leave the actual Any/Unknown entry out of it) for less cost than the cost of the FxOrderSet.

MichaReiser · 2024-09-19T14:56:02Z

EDIT: oh, I'm guessing you actually meant UnionType::map. UnionBuilder::add is not really O(n^2); it is O(n*m) if you add a union to the builder.

Yes sorry. I still think it is because we loop over n elements and loops over all elements that have been added to this point. So it's probably n(n-1)/2

carljm · 2024-09-19T14:59:55Z

I still think it is because we loop over n elements and loops over all elements that have been added to this point. So it's probably n(n-1)/2

Yes, I agree that this is accurate for UnionType::map.

I was correcting myself about UnionBuilder::add, which is itself only O(n), or O(n*m) if we are adding another union (which won't ever be the case from UnionType::map.)

Avoid quadratic time in subsumed elements when adding a super-type of existing union elements. Reserve space in advance when adding multiple elements (from another union) to a union. Make union elements a `Box<[Type]>` instead of an `FxOrderSet`; the set doesn't buy much since the rules of union uniqueness are defined in terms of supertype/subtype, not in terms of simple type identity. Move sealed-boolean handling out of a separate `UnionBuilder::simplify` method and into `UnionBuilder::add`; now that `add` is iterating existing elements anyway, this is more efficient. Remove `UnionType::contains`, since it's now `O(n)` and we shouldn't really need it, generally we care about subtype/supertype, not type identity. (Right now it's used for `Type::Unbound`, which shouldn't even be a type.) Add support for `is_subtype_of` for the `object` type. Addresses comments on #13401

hauntsaninja · 2024-09-25T08:29:15Z

crates/red_knot_python_semantic/src/types/builder.rs

            }
            Type::Never => {}
            _ => {
+                let mut remove = vec![];


In equivalent mypy code, I had to add a special fast path for literals. You can do better than quadratic for unions with lots of literals of the same type, which turns out to be a thing in the wild

Interesting, thanks for pointing this out! I took a look at that optimization in mypy.

I think the case this would optimize is where a union already contains e.g. str, and then we try to add lots of string literal types to it, every one of which is redundant because its a subtype of str. Rather than going through all existing union members to check if each literal is a subtype of any of them, we can keep a hash-set of "types present in this union which have literal forms" and do an O(1) contains check against that set as the first step when adding a literal type to the union. Framed in more general terms, it's identifying that a certain set of common types have a single super-type that is most likely to rule them out of the union, and so we optimize checking for that most likely super-type by identity.

This makes sense; I'd prefer to wait to add this kind of optimization until we see it crop up in a real-world codebase and can evaluate the actual impact of the optimization in our case, but it's definitely a useful idea to keep in mind.

I think what would be useful is if we added one (or more) benchmarks based on a real-world codebase that makes heavy use of large literals. (I.e., pydantic.)

That's not quite the right description, it's also useful when the union doesn't contain the supertype (i.e. str). For instance, say if you were combining two unions that you knew consisted only of literal types, you could use a set union, which is linear. The mypy optimisation I added is basically that, but also works when there are non-literal types thrown in as well. Fair enough on waiting though!

Oh, thanks, yeah, I misread the code. The set is unduplicated_literal_fallbacks, not duplicated_literal_fallbacks. So it looks like it's optimizing only the case you described; the mirror image of the case I described.

I've created a new issue collating some of the perf issues mypy and pyright have encountered relating to unions: #13549

Nice!
I think there are some mypy PRs missing from the list, so if you're interested in code I'd make sure to look at main.
I'll also make it such that if you're interested in real world use cases you should only have to look at primer, looks like there are 1-2 things I never actually added.

carljm added the red-knot Multi-file analysis & type inference label Sep 18, 2024

carljm requested review from MichaReiser and AlexWaygood as code owners September 18, 2024 23:22

AlexWaygood approved these changes Sep 19, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types/builder.rs Outdated Show resolved Hide resolved

carljm force-pushed the cjm/declared-vs-non branch from 51d8d50 to 7eeefad Compare September 19, 2024 04:05

carljm force-pushed the cjm/simplify-union-subtype branch from 29db6b8 to 65732ee Compare September 19, 2024 04:08

carljm force-pushed the cjm/declared-vs-non branch from 7eeefad to f4e2b7a Compare September 19, 2024 04:39

carljm force-pushed the cjm/simplify-union-subtype branch from 65732ee to 44ea4ea Compare September 19, 2024 04:40

Base automatically changed from cjm/declared-vs-non to main September 19, 2024 04:47

carljm added 2 commits September 18, 2024 21:48

[red-knot] simplify subtypes from unions

22648ea

review comments

37c695f

carljm force-pushed the cjm/simplify-union-subtype branch from 44ea4ea to 37c695f Compare September 19, 2024 04:49

carljm merged commit cf1e91b into main Sep 19, 2024
20 checks passed

carljm deleted the cjm/simplify-union-subtype branch September 19, 2024 05:06

MichaReiser reviewed Sep 19, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types/builder.rs Show resolved Hide resolved

crates/red_knot_python_semantic/src/types/builder.rs Show resolved Hide resolved

crates/red_knot_python_semantic/src/types/builder.rs Show resolved Hide resolved

carljm mentioned this pull request Sep 19, 2024

[red-knot] more efficient UnionBuilder::add #13411

Merged

hauntsaninja reviewed Sep 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] simplify subtypes from unions #13401

[red-knot] simplify subtypes from unions #13401

carljm commented Sep 18, 2024

github-actions bot commented Sep 18, 2024 •

edited

Loading

MichaReiser commented Sep 19, 2024

carljm commented Sep 19, 2024 •

edited

Loading

AlexWaygood commented Sep 19, 2024

carljm commented Sep 19, 2024

MichaReiser commented Sep 19, 2024

carljm commented Sep 19, 2024 •

edited

Loading

hauntsaninja Sep 25, 2024 •

edited

Loading

carljm Sep 25, 2024 •

edited

Loading

AlexWaygood Sep 25, 2024

hauntsaninja Sep 25, 2024

carljm Sep 25, 2024

AlexWaygood Sep 29, 2024

hauntsaninja Sep 29, 2024 •

edited

Loading

[red-knot] simplify subtypes from unions #13401

[red-knot] simplify subtypes from unions #13401

Conversation

carljm commented Sep 18, 2024

github-actions bot commented Sep 18, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

MichaReiser commented Sep 19, 2024

carljm commented Sep 19, 2024 • edited Loading

AlexWaygood commented Sep 19, 2024

carljm commented Sep 19, 2024

MichaReiser commented Sep 19, 2024

carljm commented Sep 19, 2024 • edited Loading

hauntsaninja Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

carljm Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

AlexWaygood Sep 25, 2024

Choose a reason for hiding this comment

hauntsaninja Sep 25, 2024

Choose a reason for hiding this comment

carljm Sep 25, 2024

Choose a reason for hiding this comment

AlexWaygood Sep 29, 2024

Choose a reason for hiding this comment

hauntsaninja Sep 29, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Sep 18, 2024 •

edited

Loading

`ruff-ecosystem` results

carljm commented Sep 19, 2024 •

edited

Loading

carljm commented Sep 19, 2024 •

edited

Loading

hauntsaninja Sep 25, 2024 •

edited

Loading

carljm Sep 25, 2024 •

edited

Loading

hauntsaninja Sep 29, 2024 •

edited

Loading