Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: unions using protocol classes. #12577

Closed
wants to merge 3 commits into from

Conversation

kaos
Copy link
Member

@kaos kaos commented Aug 16, 2021

POC to re-design union rules.

From the explorations done (as shown in the current changes in this PR), I've landed in that going with Protocols is a feasible way forward, when used up front without any facades.
No, Protocol classes did not really fit well when used at runtime. Attempt #2 using ABCs instead.

This could look like something like this:

from pants.engine.unions import UnionBase, UnionRule

class Vehicle(UnionBase):
    """Union base for vehicle types."""
    def model(self) -> str: ...

class Truck(Vehicle):
    """Another union base, derived from the first one."""
    wheel_count(self) -> int: ...

class Volvo(Vehicle):
    """Union member."""
    def model(self) -> str: return "XC90"

class Peterbilt(Truck):
    """Another union member, for trucks."""
    def model(self) -> str: return "359"
    def wheel_count(self) -> int: return 6

def rules() ->
    return [
        UnionRule(Vehicle, Volvo),
        UnionRule(Truck, Peterbilt),
    ]

N.B. I'm not sure if there's a downside to that it is hard to distinguish which classes are union base types, and which are union members, that will be up to the UnioRule calls to settle. Perhaps a naming scheme here could help.

Also, what other current union quirks are there, that we'd like to resolve with a change like this..

The runtime_checkable would need to be optional, in order to allow attributes on the union protocol. But without it, we can not use issubclass() to check if the union member satisfies the Protocol. Also, I didn't manage to get mypy to help with this either, if you find a way to get mypy to understand this, it would be great:

class Proto(Protocol):
    attr: str

class Impl:
    pass

@dataclass
class Foo(Generic[T]):
   attr1: T
   attr2: Type[T]

foo = Foo(Proto, Impl)   # mypy accepts this, as I've not found a way to force mypy to test the Protocol conformance here..
bar: Type[Proto] = Impl  # but not this, as Impl is missing the str attr.

Signed-off-by: Andreas Stenius <[email protected]>

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
Signed-off-by: Andreas Stenius <[email protected]>

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for prototyping (...pun intended)! I think that this looks like a really promising angle.

The runtime_checkable would need to be optional, in order to allow attributes on the union protocol. But without it, we can not use issubclass() to check if the union member satisfies the Protocol. Also, I didn't manage to get mypy to help with this either, if you find a way to get mypy to understand this, it would be great:

Interesting... I think that by definition, the engine never uses issubclass to check whether something is a union member (because the only accurate way to determine that is "it's in the UnionMembership").

So... I think that this is fine? Users could (as you say) optionally add @runtime_checkable if they wanted to, but otherwise the primary benefit of using Protocol is one of mypy-level type safety and a way to declare the interface.

Comment on lines 40 to 41
@runtime_checkable
class Base2(Protocol):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in order to replace unions, this will require one more explicit opt-in: we need to be able to detect that something is a union base for:

for the_get in rule.input_gets:
if union.is_instance(the_get.input_type):
# If the registered subject type is a union, add Get edges to all registered
# union members.
for union_member in union_membership.get(the_get.input_type):
add_get_edge(the_get.output_type, union_member)
else:
# Otherwise, the Get subject is a "concrete" type, so add a single Get edge.
add_get_edge(the_get.output_type, the_get.input_type)

...and I'm not sure that extending Protocol is sufficient to indicate that you're not using a concrete class in that position? ... maybe it is though, if there is a convention to never actually ever extend Protocol? But I don't think it's prohibited...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't line 659 just be switched (today - independent of this WIP) to using the exact the_get.input_type as a key into union_membership instead of union.is_instance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Duh. Yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This location needs patching too..

# If the input_type is not annotated with `@union`, then we validate that the input is
# exactly the same type as the input_type. (Why not check unions? We don't have access to
# `UnionMembership` to know if it's a valid union member. The engine will check that.)
if not union.is_instance(self.input_type) and type(input_) != self.input_type:
# We can assume we're using the longhand form because the shorthand form guarantees
# that the `input_type` is the same as `input`.
raise TypeError(
f"Invalid Get. The third argument `{input_}` must have the exact same type as the "
f"second argument, {self.input_type}, but had the type {type(input_)}."
)

Copy link
Member Author

@kaos kaos Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't line 659 just be switched (today - independent of this WIP) to using the exact the_get.input_type as a key into union_membership instead of union.is_instance?

No, it does not work, as it changes the semantics of when add_get_edge is called. Consider a union input_type but that has not been registered, then we end up calling add_get_edge(output_type, input_type) rather than not calling add_get_edge at all.

I think a better approach is to move the union test into a dedicated function in unions.py. Oh, wait, there already is such a function, we should only use it :P

Edit:
Not 100% sure my conclusion about why it doesn't work is right, but the tests agree that its problematic:

src/python/pants/engine/internals/scheduler.py:209: ValueError
=========================== short test summary info ============================
FAILED src/python/pants/engine/internals/scheduler_test.py::test_use_params
FAILED src/python/pants/engine/internals/scheduler_test.py::test_strict_equals
FAILED src/python/pants/engine/internals/scheduler_test.py::test_union_rules
FAILED src/python/pants/engine/internals/scheduler_test.py::test_outlined_get
FAILED src/python/pants/engine/internals/scheduler_test.py::test_trace_includes_rule_exception_traceback
FAILED src/python/pants/engine/internals/scheduler_test.py::test_unhashable_types_failure
ERROR src/python/pants/engine/internals/scheduler_test.py::test_transitive_params
ERROR src/python/pants/engine/internals/scheduler_test.py::test_consumed_types
========================= 6 failed, 2 errors in 1.68s ==========================

@kaos
Copy link
Member Author

kaos commented Aug 17, 2021

So... I think that this is fine? Users could (as you say) optionally add @runtime_checkable if they wanted to, but otherwise the primary benefit of using Protocol is one of mypy-level type safety and a way to declare the interface.

Yes, there is no check today, so leaving off runtime_checkable is status quo.

However, we do not get the mypy-level of type safety when registering Union members with UnionRule solely using Protocols, that was my caveat at the end, with the foo and bar assignment examples. (we'd use the foo form for unions in pants, unless we can figure out how to make mypy check the args as assignable, perhaps with a mypy plugin, will explore that a bit..)

Comment on lines 123 to 125
# Base2 and Base3 are both explicitly Protocols, so this is OK
b: Base2 = B()
c: Base3 = C()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we do not get the mypy-level of type safety when registering Union members with UnionRule solely using Protocols, that was my caveat at the end, with the foo and bar assignment examples.

Assuming that the lines I'm commenting on here are actually correctly typechecked, then this seems like a good approach to me: IMO, if we're able to completely remove the "union" concept in favor of extending Protocol, and then renaming UnionRule to ProtocolInstance (strawname, but you get the idea), then that's great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are typechecked correctly, so that's a win. However, I fail to incorporate this syntax into something to include in the list of rules.

I've posted a question on SO for this: https://stackoverflow.com/questions/68822297/python-protocol-implementation-class-type-check

Aahh.. unless you suggest that we use a few more lines while registering, which would accommodate mypy to do its thing..?

class MyProto (Protocol):
  a: str

class Member:
  a: str

def rules():
  m: MyProto = Member()
  return [ProtocolInstance(MyProto, m), ...]]

I guess we could work with that.. maybe even be able to hide the assignment to be done generically in a dataclass.. reducing boilerplate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to find a way around it, finally :D
more details on the SO answer I posted, but it's workable, and I think we can still translate that into a UnionRule as a first step, if we want, to make this less impact (and also be backwards compatible so we can use both in a transitional period)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so there seems to be a bug with how mypy treats Protocol class types. python/mypy#10988

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to renaming UnionRule.. I think we should be conservative with renaming too much, unless we "have" to. The concept of a union rule is well documented. Hopefully, it will continue to work referring to them as "unions" even if implemented using protocols, if we can keep the two compatible enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being cautious about renaming! I think we can actually do this fairly risk-free by aliasing UnionRule and @union and having a deprecation period. Or we can just stick with the term "union" as you suggest. Either way, no need to decide right now... :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly attached to the term union (especially since it has a different meaning in Python typing), so in the medium term (post deprecation) actually switching to Protocol makes sense to me. It gives us better alignment with Python typing (only with the addition of registration).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, FWIW: the entire plugin API is currently marked experimental: we're not going to stabilize it until one or both of Go or JVM are consuming it happily (Go is getting close, although there is some friction around the current implementation of file Targets and a potential need for Target generation: #7022 (comment)).

kaos added a commit to kaos/pants that referenced this pull request Aug 18, 2021
…ork for pantsbuild#12577

Signed-off-by: Andreas Stenius <[email protected]>

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
stuhood pushed a commit that referenced this pull request Aug 18, 2021
prep work for #12577

Signed-off-by: Andreas Stenius <[email protected]>

[ci skip-rust]
[ci skip-build-wheels]
Signed-off-by: Andreas Stenius <[email protected]>

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@kaos
Copy link
Member Author

kaos commented Aug 27, 2021

OK, as it turns out. Protocol classes where not suitable for runtime use in a dynamic capacity, as we're going for here. So a better fit is to use abstract base classes instead. I've updated my POC to show case this, and it works like a charm, IMHO. ;)

See the tests for another example.

@kaos
Copy link
Member Author

kaos commented Aug 27, 2021

Updated PR description, but there's a open question in there, I think is easy to overlook... so adding it here too

Also, what other current union quirks are there, that we'd like to resolve with a change like this..

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With using ABC, I think that's similar to how most unions are implemented now: many of ours already do that. It makes sense, but still suffers from the problem that subclasses will have transitively subclassed UnionBase so look like they are bases themselves too.

I realize now maybe the solution is right in front of us. Change our Rust code to determine if it's a union base by inspecting the registered UnionRule types and looking for exact type matches with the base argument, rather than inspecting for an @union director or subclassing ABC. I think that solves this subclassing conundrum.

@kaos
Copy link
Member Author

kaos commented Aug 27, 2021

As always, I've learned a lot about the Python type system, and Protocol classes nonetheless. So even if we're not going to use any of this, it's been time well spent, on my part.

@Eric-Arellano
Copy link
Contributor

As always, I've learned a lot about the Python type system, and Protocol classes nonetheless. So even if we're not going to use any of this, it's been time well spent, on my part.

And it's prompting revisiting one of our clunkier APIs - time well spent from my point of view too :) thanks for looking into this all.

--

Change our Rust code to determine if it's a union base by inspecting the registered UnionRule types and looking for exact type matches with the base argument, rather than inspecting for an @union director or subclassing ABC.

A challenge here is it might be easy to unintentionally make something a union base when you didn't want to? Because you mess up the ordering of UnionRule(Base, Member), for example. It may be worth keeping a requirement that you also set @union on the base, or your UnionBase ABC proposal?

But we can probably Keep It Simple at first and see how that does.

--

Oh, hm....another challenge is when you define a Union but not any plugins for it. We do this sometimes with Pants when defining plugin hooks for people, which we ourselves don't consume. In this case, the rule graph code wouldn't know the UnionBase exists, which could cause rule graph issues.

This leads me to the proposal: add UnionBase(Vehicle) to the @rules section, rather than annotating @union or requiring ABC. Then, you still do UnionRule(Base, Member). Still a little wonky, but solves the subclassing problem.

Wdyt?

@kaos
Copy link
Member Author

kaos commented Aug 27, 2021

A challenge here is it might be easy to unintentionally make something a union base when you didn't want to?

I think having a dedicated base class that you inherit from is useful, regardless. And we can even turn the union decorator enforcing this, for backwards compatibility, if needed.

Also, to make the mistake go away, the need for providing the union base when registering should not be needed, if it is already provided when you declare your union member class.

class MyUnion (UnionBase): ...

class MyUnionMember (MyUnion): ....


def rules(): return [UnionMember(MyUnionMember), ...]

The UnionMember rule registrator should be able to pull out MyUnion from MyUnionMember.
Could even consider (but perhaps is a bit too magical?):

def rules(): return [MyUnionMember.rule, ...]

I think it makes sense to register union bases where you declare them, so they always exist in the UnionMembership, even if they don't have any members registered.

Something like:

class MyUnion (UnionBase): ...


def rules(): return [
        MyUnion.rule,  # the UnionBase needs to be clever here, to see the difference between a class directly inheriting UnionBase, vs. one indirectly inherting it from e.g. MyUnion.
        # or
        UnionType(MyUnion),  # UnionBase is taken, unless we call it something else...
        ...
    ]

Edit: OR... collect_rules() could pick this up automagically, like it does for @rule methods... ;) perhaps with an opt-out flag, or some such... then you wouldn't have to worry about registering them any more than you do with your ordinary rules.

@stuhood
Copy link
Member

stuhood commented Aug 27, 2021

Using an ABC would be a positive step, but the need to subclass rather than only having the Protocol marked is unfortunate. I'd like to read more about why Protocol won't work: is there a particular comment/blurb that explains that? If it's possible to check at runtime that a Protocol is @runtime_checkable, then even that would probably be preferable to the ABC.

@kaos
Copy link
Member Author

kaos commented Aug 27, 2021

Ok. There are more details in the linked mypy issue.

stuhood added a commit that referenced this pull request Oct 26, 2021
#13301 changed `TffmtRequest`'s inheritance of `StyleRequest`, but didn't add an abstract field. `mypy` didn't flag this, likely because our use of `@union` obscures the constructor call that might have triggered the error: see https://mypy.readthedocs.io/en/stable/class_basics.html#abstract-base-classes-and-multiple-inheritance. 

It's possible that #12577 would improve the situation here, but it doesn't seem worth adding heavier integration tests (which would invoke the entire `lint` and `fmt` goals) just to catch this kind of case.

[ci skip-rust]
[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this pull request Oct 26, 2021
pantsbuild#13301 changed `TffmtRequest`'s inheritance of `StyleRequest`, but didn't add an abstract field. `mypy` didn't flag this, likely because our use of `@union` obscures the constructor call that might have triggered the error: see https://mypy.readthedocs.io/en/stable/class_basics.html#abstract-base-classes-and-multiple-inheritance. 

It's possible that pantsbuild#12577 would improve the situation here, but it doesn't seem worth adding heavier integration tests (which would invoke the entire `lint` and `fmt` goals) just to catch this kind of case.

[ci skip-rust]
[ci skip-build-wheels]
stuhood added a commit that referenced this pull request Oct 26, 2021
#13301 changed `TffmtRequest`'s inheritance of `StyleRequest`, but didn't add an abstract field. `mypy` didn't flag this, likely because our use of `@union` obscures the constructor call that might have triggered the error: see https://mypy.readthedocs.io/en/stable/class_basics.html#abstract-base-classes-and-multiple-inheritance. 

It's possible that #12577 would improve the situation here, but it doesn't seem worth adding heavier integration tests (which would invoke the entire `lint` and `fmt` goals) just to catch this kind of case.

[ci skip-rust]
[ci skip-build-wheels]
@kaos
Copy link
Member Author

kaos commented Nov 12, 2021

Closing this POC, as any future work in this direction is better served using a fresh PR.

@kaos kaos deleted the issue/11430_unions branch November 24, 2022 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants