-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: unions using protocol classes. #12577
Conversation
Signed-off-by: Andreas Stenius <[email protected]> # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust]
Signed-off-by: Andreas Stenius <[email protected]> # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for prototyping (...pun intended)! I think that this looks like a really promising angle.
The
runtime_checkable
would need to be optional, in order to allow attributes on the union protocol. But without it, we can not useissubclass()
to check if the union member satisfies the Protocol. Also, I didn't manage to get mypy to help with this either, if you find a way to get mypy to understand this, it would be great:
Interesting... I think that by definition, the engine never uses issubclass
to check whether something is a union member (because the only accurate way to determine that is "it's in the UnionMembership
").
So... I think that this is fine? Users could (as you say) optionally add @runtime_checkable
if they wanted to, but otherwise the primary benefit of using Protocol
is one of mypy-level type safety and a way to declare the interface.
@runtime_checkable | ||
class Base2(Protocol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in order to replace unions, this will require one more explicit opt-in: we need to be able to detect that something is a union base for:
pants/src/python/pants/engine/internals/scheduler.py
Lines 658 to 666 in 731c855
for the_get in rule.input_gets: | |
if union.is_instance(the_get.input_type): | |
# If the registered subject type is a union, add Get edges to all registered | |
# union members. | |
for union_member in union_membership.get(the_get.input_type): | |
add_get_edge(the_get.output_type, union_member) | |
else: | |
# Otherwise, the Get subject is a "concrete" type, so add a single Get edge. | |
add_get_edge(the_get.output_type, the_get.input_type) |
...and I'm not sure that extending Protocol
is sufficient to indicate that you're not using a concrete class in that position? ... maybe it is though, if there is a convention to never actually ever extend Protocol
? But I don't think it's prohibited...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't line 659 just be switched (today - independent of this WIP) to using the exact the_get.input_type
as a key into union_membership
instead of union.is_instance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Duh. Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This location needs patching too..
pants/src/python/pants/engine/internals/selectors.py
Lines 202 to 211 in 731c855
# If the input_type is not annotated with `@union`, then we validate that the input is | |
# exactly the same type as the input_type. (Why not check unions? We don't have access to | |
# `UnionMembership` to know if it's a valid union member. The engine will check that.) | |
if not union.is_instance(self.input_type) and type(input_) != self.input_type: | |
# We can assume we're using the longhand form because the shorthand form guarantees | |
# that the `input_type` is the same as `input`. | |
raise TypeError( | |
f"Invalid Get. The third argument `{input_}` must have the exact same type as the " | |
f"second argument, {self.input_type}, but had the type {type(input_)}." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't line 659 just be switched (today - independent of this WIP) to using the exact the_get.input_type as a key into union_membership instead of union.is_instance?
No, it does not work, as it changes the semantics of when add_get_edge
is called. Consider a union input_type
but that has not been registered, then we end up calling add_get_edge(output_type, input_type)
rather than not calling add_get_edge
at all.
I think a better approach is to move the union test into a dedicated function in unions.py
. Oh, wait, there already is such a function, we should only use it :P
Edit:
Not 100% sure my conclusion about why it doesn't work is right, but the tests agree that its problematic:
src/python/pants/engine/internals/scheduler.py:209: ValueError
=========================== short test summary info ============================
FAILED src/python/pants/engine/internals/scheduler_test.py::test_use_params
FAILED src/python/pants/engine/internals/scheduler_test.py::test_strict_equals
FAILED src/python/pants/engine/internals/scheduler_test.py::test_union_rules
FAILED src/python/pants/engine/internals/scheduler_test.py::test_outlined_get
FAILED src/python/pants/engine/internals/scheduler_test.py::test_trace_includes_rule_exception_traceback
FAILED src/python/pants/engine/internals/scheduler_test.py::test_unhashable_types_failure
ERROR src/python/pants/engine/internals/scheduler_test.py::test_transitive_params
ERROR src/python/pants/engine/internals/scheduler_test.py::test_consumed_types
========================= 6 failed, 2 errors in 1.68s ==========================
Yes, there is no check today, so leaving off runtime_checkable is status quo. However, we do not get the mypy-level of type safety when registering Union members with |
# Base2 and Base3 are both explicitly Protocols, so this is OK | ||
b: Base2 = B() | ||
c: Base3 = C() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we do not get the mypy-level of type safety when registering Union members with UnionRule solely using Protocols, that was my caveat at the end, with the foo and bar assignment examples.
Assuming that the lines I'm commenting on here are actually correctly typechecked, then this seems like a good approach to me: IMO, if we're able to completely remove the "union" concept in favor of extending Protocol
, and then renaming UnionRule
to ProtocolInstance
(strawname, but you get the idea), then that's great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are typechecked correctly, so that's a win. However, I fail to incorporate this syntax into something to include in the list of rules.
I've posted a question on SO for this: https://stackoverflow.com/questions/68822297/python-protocol-implementation-class-type-check
Aahh.. unless you suggest that we use a few more lines while registering, which would accommodate mypy to do its thing..?
class MyProto (Protocol):
a: str
class Member:
a: str
def rules():
m: MyProto = Member()
return [ProtocolInstance(MyProto, m), ...]]
I guess we could work with that.. maybe even be able to hide the assignment to be done generically in a dataclass.. reducing boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. https://stackoverflow.com/questions/68822297/protocol-implementation-class-type-check clarifies the issue, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to find a way around it, finally :D
more details on the SO answer I posted, but it's workable, and I think we can still translate that into a UnionRule
as a first step, if we want, to make this less impact (and also be backwards compatible so we can use both in a transitional period)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so there seems to be a bug with how mypy treats Protocol class types. python/mypy#10988
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circling back to renaming UnionRule
.. I think we should be conservative with renaming too much, unless we "have" to. The concept of a union rule is well documented. Hopefully, it will continue to work referring to them as "unions" even if implemented using protocols, if we can keep the two compatible enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for being cautious about renaming! I think we can actually do this fairly risk-free by aliasing UnionRule and @union and having a deprecation period. Or we can just stick with the term "union" as you suggest. Either way, no need to decide right now... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not particularly attached to the term union
(especially since it has a different meaning in Python typing), so in the medium term (post deprecation) actually switching to Protocol
makes sense to me. It gives us better alignment with Python typing (only with the addition of registration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, FWIW: the entire plugin API is currently marked experimental: we're not going to stabilize it until one or both of Go or JVM are consuming it happily (Go is getting close, although there is some friction around the current implementation of file Targets and a potential need for Target generation: #7022 (comment)).
…ork for pantsbuild#12577 Signed-off-by: Andreas Stenius <[email protected]> # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
prep work for #12577 Signed-off-by: Andreas Stenius <[email protected]> [ci skip-rust] [ci skip-build-wheels]
Signed-off-by: Andreas Stenius <[email protected]> # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
OK, as it turns out. Protocol classes where not suitable for runtime use in a dynamic capacity, as we're going for here. So a better fit is to use abstract base classes instead. I've updated my POC to show case this, and it works like a charm, IMHO. ;) See the tests for another example. |
Updated PR description, but there's a open question in there, I think is easy to overlook... so adding it here too
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With using ABC, I think that's similar to how most unions are implemented now: many of ours already do that. It makes sense, but still suffers from the problem that subclasses will have transitively subclassed UnionBase so look like they are bases themselves too.
I realize now maybe the solution is right in front of us. Change our Rust code to determine if it's a union base by inspecting the registered UnionRule types and looking for exact type matches with the base argument, rather than inspecting for an @union director or subclassing ABC. I think that solves this subclassing conundrum.
As always, I've learned a lot about the Python type system, and Protocol classes nonetheless. So even if we're not going to use any of this, it's been time well spent, on my part. |
And it's prompting revisiting one of our clunkier APIs - time well spent from my point of view too :) thanks for looking into this all. --
A challenge here is it might be easy to unintentionally make something a union base when you didn't want to? Because you mess up the ordering of But we can probably Keep It Simple at first and see how that does. -- Oh, hm....another challenge is when you define a Union but not any plugins for it. We do this sometimes with Pants when defining plugin hooks for people, which we ourselves don't consume. In this case, the rule graph code wouldn't know the UnionBase exists, which could cause rule graph issues. This leads me to the proposal: add Wdyt? |
I think having a dedicated base class that you inherit from is useful, regardless. And we can even turn the Also, to make the mistake go away, the need for providing the union base when registering should not be needed, if it is already provided when you declare your union member class. class MyUnion (UnionBase): ...
class MyUnionMember (MyUnion): ....
def rules(): return [UnionMember(MyUnionMember), ...] The def rules(): return [MyUnionMember.rule, ...] I think it makes sense to register union bases where you declare them, so they always exist in the UnionMembership, even if they don't have any members registered. Something like: class MyUnion (UnionBase): ...
def rules(): return [
MyUnion.rule, # the UnionBase needs to be clever here, to see the difference between a class directly inheriting UnionBase, vs. one indirectly inherting it from e.g. MyUnion.
# or
UnionType(MyUnion), # UnionBase is taken, unless we call it something else...
...
] Edit: OR... |
Using an ABC would be a positive step, but the need to subclass rather than only having the |
Ok. There are more details in the linked mypy issue. |
#13301 changed `TffmtRequest`'s inheritance of `StyleRequest`, but didn't add an abstract field. `mypy` didn't flag this, likely because our use of `@union` obscures the constructor call that might have triggered the error: see https://mypy.readthedocs.io/en/stable/class_basics.html#abstract-base-classes-and-multiple-inheritance. It's possible that #12577 would improve the situation here, but it doesn't seem worth adding heavier integration tests (which would invoke the entire `lint` and `fmt` goals) just to catch this kind of case. [ci skip-rust] [ci skip-build-wheels]
pantsbuild#13301 changed `TffmtRequest`'s inheritance of `StyleRequest`, but didn't add an abstract field. `mypy` didn't flag this, likely because our use of `@union` obscures the constructor call that might have triggered the error: see https://mypy.readthedocs.io/en/stable/class_basics.html#abstract-base-classes-and-multiple-inheritance. It's possible that pantsbuild#12577 would improve the situation here, but it doesn't seem worth adding heavier integration tests (which would invoke the entire `lint` and `fmt` goals) just to catch this kind of case. [ci skip-rust] [ci skip-build-wheels]
#13301 changed `TffmtRequest`'s inheritance of `StyleRequest`, but didn't add an abstract field. `mypy` didn't flag this, likely because our use of `@union` obscures the constructor call that might have triggered the error: see https://mypy.readthedocs.io/en/stable/class_basics.html#abstract-base-classes-and-multiple-inheritance. It's possible that #12577 would improve the situation here, but it doesn't seem worth adding heavier integration tests (which would invoke the entire `lint` and `fmt` goals) just to catch this kind of case. [ci skip-rust] [ci skip-build-wheels]
Closing this POC, as any future work in this direction is better served using a fresh PR. |
POC to re-design union rules.
From the explorations done (as shown in the current changes in this PR),
I've landed in that going with Protocols is a feasible way forward, when used up front without any facades.No, Protocol classes did not really fit well when used at runtime. Attempt #2 using ABCs instead.
This could look like something like this:
N.B. I'm not sure if there's a downside to that it is hard to distinguish which classes are union base types, and which are union members, that will be up to the UnioRule calls to settle. Perhaps a naming scheme here could help.
Also, what other current union quirks are there, that we'd like to resolve with a change like this..
Theruntime_checkable
would need to be optional, in order to allow attributes on the union protocol. But without it, we can not useissubclass()
to check if the union member satisfies the Protocol. Also, I didn't manage to get mypy to help with this either, if you find a way to get mypy to understand this, it would be great: