Add generic mechanism to codegen sources in V2 #9634

Eric-Arellano · 2020-04-25T17:45:17Z

Goals of design

See https://docs.google.com/document/d/1tJ1SL3URSXUWlrN-GJ1fA1M4jm8zqcaodBWghBWrRWM/edit?ts=5ea310fd for more info.

tl;dr:

Protocols now only have one generic target, like avro_library. This means that call sites must declare which language should be generated from that protocol.
- Must be declarative.
You can still get the original protocol sources, e.g. for ./pants filedeps.
Must work with subclassing of fields.
Must be extensible.
- Example: Pants only implements Thrift -> Python. A plugin author should be able to add Thrift -> Java.

Implementation

Normally, to hydrate sources, we call await Get[HydratedSources](HydrateSourcesRequest(my_sources_field)). We always use the exact same rule to do this because all sources fields are hydrated identically.

Here, each codegen rule is unique. So, we need to use unions. This means that we also need a uniform product for each codegen rule for the union to work properly. This leads to:

await Get[GeneratedSources](GenerateSourcesRequest, GeneratePythonFromAvroRequest(..))
await Get[GeneratedSources](GenerateSourcesRequest, GenerateJavaFromThriftRequest(..))

Each GenerateSourcesRequest subclass gets registered as a union rule. This achieves goal #4 of extensibility.

--

To still work with subclassing of fields (goal #3), each GenerateSourcesRequest declares the input type and output type, which then allows us to use isinstance() to accommodate subclasses:

class GenerateFortranFromAvroRequest(GenerateSourcesRequest):
    input = AvroSources
    output = FortranSources

--

To achieve goals #1 and #2 of allowing call sites to declaratively either get the original protocol sources or generated sources, we hook up codegen to the hydrate_sources rule and HydrateSourcesRequest type:

protocol_sources = await Get[HydratedSources](HydrateSourcesRequest(avro_sources, for_sources_types=[FortranSources], codegen_enabled=True))

[ci skip-rust-tests]
[ci skip-jvm-tests]

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano

Thank you Stu, Danny, and Benjy for your help this past week!

src/python/pants/engine/target.py

Eric-Arellano · 2020-04-25T17:54:07Z

src/python/pants/engine/target_test.py

+
+class GenerateFortranFromAvroRequest(GenerateSourcesRequest):
+    input = AvroSources
+    output = FortranSources


NB: this is technically a lie. We are not returning FortranSources. Rather, we are returning GeneratedSources with a Snapshot of Fortran files.

I originally added a new type called CodegenTargetLanguage so that this would be output = Fortran. I got rid of it because that is still a lie, and it led to too much boilerplate.

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

stuhood

Thanks @Eric-Arellano : this looks great!

I expect that more change is likely to be necessary, because I expect that codegenerators are going to need to consume other fields of a Target in order to do their work (for example: it's common to have some generator-specific config on the input Target). But this is a great incremental step.

src/python/pants/engine/target.py

stuhood · 2020-04-26T21:54:21Z

src/python/pants/engine/target.py

+class GeneratedSources:
+    snapshot: Snapshot
+
+
 @rule
 async def hydrate_sources(


This rule probably warrants a docstring that incorporates some of the description from this PR?

There's already a lot of documentation now in the rule body and in the docstring of the relevant classes. I think adding docstring to the rule might be noisy.

src/python/pants/engine/target.py

stuhood · 2020-04-26T22:10:05Z

src/python/pants/engine/target.py

@@ -1444,5 +1536,6 @@ def rules():
        find_valid_configurations,
        hydrate_sources,
        RootRule(TargetsToValidConfigurationsRequest),
+        RootRule(GenerateSourcesRequest),


This should probably not be a RootRule: the conventions are nowhere near solid, but I think of them as being a bit like a public API, and in this case GenerateSourcesRequest is an implementation detail of HydrateSourcesRequest.

If a test wants to poke at those rules more directly, it can add a RootRule itself to do so. But I think that we should think of the roots that we expose in rulesets as their external inputs for general use.

Hm, my understanding of RootRules is that you should use them whenever you directly inject a type into the graph, rather than deriving that type from other rules in the graph? Meaning, almost every Request class should be a RootRule because they are almost always directly created through a Python constructor, rather than being derived from some other rule.

Is this mental model the wrong way of understanding RootRule?

Is this mental model the wrong way of understanding RootRule?

Yes, that is the wrong way to think about RootRule. See https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/README.md#gets-and-rootrules

In short: a Param enters the graph either via a Get or via a RootRule. Things that enter as Gets do not need to be declared as roots. This is where the "root" in the name comes from: you only need to declare something a RootRule if it might come in at the "root" of a graph: ie, scheduler.product_request.

Oh, huh. We're declaring way too many RootRules then, I think, in part from bad advice I gave Benjy. I'll clean that up.

cc @benjyw we shouldn't be using RootRule as much as I thought.

Eric-Arellano

because I expect that codegenerators are going to need to consume other fields of a Target in order to do their work

Benjy and I talked about this use case a couple of minutes ago. I realized this is indeed possible with the current design :) Every AsyncField, like Sources, has an address property. This means that it's possible to do:

await Get[WrappedTarget](Address, sources_field.address)

Eric-Arellano · 2020-04-27T18:04:41Z

src/python/pants/engine/target.py

@@ -1444,5 +1536,6 @@ def rules():
        find_valid_configurations,
        hydrate_sources,
        RootRule(TargetsToValidConfigurationsRequest),
+        RootRule(GenerateSourcesRequest),


Hm, my understanding of RootRules is that you should use them whenever you directly inject a type into the graph, rather than deriving that type from other rules in the graph? Meaning, almost every Request class should be a RootRule because they are almost always directly created through a Python constructor, rather than being derived from some other rule.

Is this mental model the wrong way of understanding RootRule?

src/python/pants/engine/target.py

[ci skip-jvm-tests] [ci skip-rust-tests]

# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] [ci skip-rust-tests]

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

[ci skip-rust-tests] [ci skip-jvm-tests] [ci skip-rust-tests] [ci skip-jvm-tests]

Stu and Benjy pointed out that it's common for a target to have fields on it that influence how the compiler runs. So, it will be common for the codegen rules to access the original target. While we could expect rules to call `await Get[WrappedTarget](Address)` directly in the rules, this makes life simpler for them. # Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

# Conflicts: # src/python/pants/engine/target.py [ci skip-rust-tests] [ci skip-jvm-tests]

We need this for run_setup_py.py # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

@stuhood

…pected (#9641) Soon, we will add codegen. With this, we need a way to signal which language should be generated, if any. @stuhood proposed in #9634 (comment) that we can extend this setting to indicate more generally which Sources fields are valid, e.g. that we expect to work with `PythonSources` and `FilesSources` (or subclasses), but nothing else. All invalid fields would return an empty snapshot and indicate via a new `HydratedSources.output_type` field that it was an invalid sources field. This means that call sites can still pre-filter sources fields like they typically do via `tgt.has_field()` (and configurations), but they can also use this new sugar. If they want to use codegen in the upcoming PR, they must use this new mechanism. Further, when getting back the `HydratedSources`, call sites can switch on the type. Previously, they could do this by zipping the original `Sources` with the resulting `HydratedSources`, but this won't work once we have codegen, as the original `Sources` will be, for example, `ThriftSources`. ```python if hydrated_sources.output_type == PythonSources: ... elif hydrated_sources.output_type == FilesSources: ... ``` [ci skip-rust-tests] [ci skip-jvm-tests]

…rate-sources [ci skip-jvm-tests] [ci skip-rust-tests]

# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Less of a misnomer. This is not the actual output. It's rather a description of the sources. This also fits in better with `HydrateSourcesRequest.for_sources_types`. # Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano

Now ready for re-review.

Eric-Arellano · 2020-04-28T20:51:52Z

src/python/pants/engine/target.py

+                f"Multiple of the registered code generators can generate {output} from {input}. "
+                "It is ambiguous which implementation to use.\n\nPossible implementations:"
+                f"{bulleted_list_sep}{bulleted_list_sep.join(possible_generators)}"


This reads:

Multiple of the registered code generators can generate FortranSources from AvroSources. It is ambiguous which implementation to use. Possible implementations: * FortranGenerator1 * FortranGenerator2

Eric-Arellano · 2020-04-28T20:52:31Z

src/python/pants/engine/target.py

+            super().__init__(
+                f"Multiple of the registered code generators can generate one of "
+                f"{possible_output_types} from {input}. It is ambiguous which implementation to "
+                f"use. This can happen when the call site requests too many different output types "
+                f"from the same original protocol sources.\n\nPossible implementations with their "
+                f"output type: {bulleted_list_sep}"
+                f"{bulleted_list_sep.join(possible_generators_with_output)}"


This reads:

Multiple of the registered code generators can generate one of ['FortranSources', 'SmalltalkSources'] from AvroSources. It is ambiguous which implementation to use. This can happen when the call site requests too many different output types from the same original protocol sources. Possible implementations with their output type: * FortranGenerator1 -> FortranSources * SmalltalkGenerator -> SmalltalkSources

Eric-Arellano · 2020-04-28T20:54:23Z

src/python/pants/engine/target.py

+    @final
+    @classmethod
+    def can_generate(cls, output_type: Type["Sources"], union_membership: UnionMembership) -> bool:
+        """Can this Sources field be used to generate the output_type?"""


Adding this method is important. While we generally expect people to rely on HydratedSourcesRequest.for_sources_types to know if codegen is possible, there are some cases where we need to pre-filter without hydrating the field. For example, run_setup_py.py needs this.

This method allows us to fully support the pre-filtering idiom that we normally use with the Target API via tgt.has_field().

The docstring here could maybe indicate that most callers won't need to call this, and recommend HydratedSourcesRequest.

Eric-Arellano · 2020-04-28T20:55:24Z

src/python/pants/engine/target.py

+class GeneratedSources:
+    snapshot: Snapshot
+
+
 @rule
 async def hydrate_sources(


There's already a lot of documentation now in the rule body and in the docstring of the relevant classes. I think adding docstring to the rule might be noisy.

stuhood

Thanks a lot Eric! This looks awesome.

stuhood · 2020-04-28T21:27:30Z

src/python/pants/engine/target.py

+    @final
+    @classmethod
+    def can_generate(cls, output_type: Type["Sources"], union_membership: UnionMembership) -> bool:
+        """Can this Sources field be used to generate the output_type?"""


The docstring here could maybe indicate that most callers won't need to call this, and recommend HydratedSourcesRequest.

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

This creates a generic `protobuf_library()` to be used across all languages. From this, if `pants.backend.codegen.protobuf.python` is activated, then V2 call sites which signal they want to use codegen (via #9634) will automatically convert any `ProtobufSources` into generated Python sources. ### Runtime Protobuf dependency To function properly, generated Protobuf Python files must depend on the `protobuf` Python wheel. Normally, in V1, we would inject this dependency dynamically. For now, the user must explicitly specify the dependency in the BUILD file like they do for any normal Python requirement. ### No `gen` goal Users do not have a direct way to inspect the generated files. For now, they would need to have a `python_binary()` depend on the `protobuf_library`s, then inspect the built PEX after `./v2 binary`. ### How we handle source roots Protoc needs to generate files with the source roots stripped. `src/protobuf/example/f.proto` becomes `example/f_pb2.py`. But, Protoc won't naively understand the input file `src/protobuf/example/f.proto` because it doesn't know how to naively handle source roots. In V1, we used the flag `--proto_path` to teach Protoc how to understand source roots. Instead, here, we take the approach of V2 Python of stripping off the source roots for these reasons: 1) We have lots of utility rules for stripping source roots already. 2) Those utilities have important performance optimizations. * Specifically, we only inspect 1 `sources` file for the target to determine the source root, rather than naively inspecting every `sources` file. 3) Parity with v2 Python implementation. [ci skip-rust-tests] [ci skip-jvm-tests]

Add generic mechanism to codegen sources in V2

6e7e1b7

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano requested review from stuhood, benjyw and cosmicexplorer April 25, 2020 17:45

Eric-Arellano commented Apr 25, 2020

View reviewed changes

Add proper exception message for ambiguous codegen implementations

7877e89

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

stuhood approved these changes Apr 26, 2020

View reviewed changes

Eric-Arellano commented Apr 27, 2020

View reviewed changes

Eric-Arellano added 2 commits April 27, 2020 14:55

Simplify excluding _python_requirements_library() from built PEXes

5340f74

[ci skip-jvm-tests] [ci skip-rust-tests]

Allow HydratedSourcesRequest to indicate which Sources types are valid

ff97ac4

# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] [ci skip-rust-tests]

Eric-Arellano mentioned this pull request Apr 27, 2020

Allow HydratedSourcesRequest to indicate which Sources types are expected #9641

Merged

Eric-Arellano added 5 commits April 27, 2020 17:11

Move around some of target.py to better organize this new code

9fe244a

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Merge branch 'master' of github.com:pantsbuild/pants into output-type

d76cbfc

[ci skip-rust-tests] [ci skip-jvm-tests] [ci skip-rust-tests] [ci skip-jvm-tests]

Merge branch 'output-type' into codegen-hydrate-sources

3e64493

# Conflicts: # src/python/pants/engine/target.py [ci skip-rust-tests] [ci skip-jvm-tests]

Add Sources.can_generate() to allow pre-filtering before hydration

3596a9c

We need this for run_setup_py.py # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano added 4 commits April 28, 2020 11:00

Merge branch 'master' of github.com:pantsbuild/pants into codegen-hyd…

27af0c8

…rate-sources [ci skip-jvm-tests] [ci skip-rust-tests]

Update call sites to use codegen

b29c30c

# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Improve error message for ambiguous codegen implementations

4138f38

# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano commented Apr 28, 2020

View reviewed changes

stuhood approved these changes Apr 28, 2020

View reviewed changes

Expand docstring for Sources.can_generate()

7aabe19

# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Eric-Arellano merged commit 9945df1 into pantsbuild:master Apr 28, 2020

Eric-Arellano deleted the codegen-hydrate-sources branch April 28, 2020 23:01

Eric-Arellano mentioned this pull request Apr 29, 2020

Add Protobuf Python support to V2 #9651

Merged

Eric-Arellano mentioned this pull request Apr 30, 2020

WIP: Don't register RootRule when not necessary #9663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic mechanism to codegen sources in V2 #9634

Add generic mechanism to codegen sources in V2 #9634

Eric-Arellano commented Apr 25, 2020 •

edited

Loading

Eric-Arellano left a comment

Eric-Arellano Apr 25, 2020

stuhood left a comment

stuhood Apr 26, 2020 •

edited

Loading

Eric-Arellano Apr 28, 2020

stuhood Apr 26, 2020

Eric-Arellano Apr 27, 2020

stuhood Apr 27, 2020

Eric-Arellano Apr 27, 2020

Eric-Arellano left a comment

Eric-Arellano Apr 27, 2020

Eric-Arellano left a comment

Eric-Arellano Apr 28, 2020

Eric-Arellano Apr 28, 2020

Eric-Arellano Apr 28, 2020

stuhood Apr 28, 2020

Eric-Arellano Apr 28, 2020

stuhood left a comment

stuhood Apr 28, 2020

Add generic mechanism to codegen sources in V2 #9634

Add generic mechanism to codegen sources in V2 #9634

Conversation

Eric-Arellano commented Apr 25, 2020 • edited Loading

Goals of design

Implementation

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

stuhood Apr 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano commented Apr 25, 2020 •

edited

Loading

stuhood Apr 26, 2020 •

edited

Loading