-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic mechanism to codegen sources in V2 #9634
Add generic mechanism to codegen sources in V2 #9634
Conversation
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Stu, Danny, and Benjy for your help this past week!
|
||
class GenerateFortranFromAvroRequest(GenerateSourcesRequest): | ||
input = AvroSources | ||
output = FortranSources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: this is technically a lie. We are not returning FortranSources
. Rather, we are returning GeneratedSources
with a Snapshot of Fortran files.
I originally added a new type called CodegenTargetLanguage
so that this would be output = Fortran
. I got rid of it because that is still a lie, and it led to too much boilerplate.
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Eric-Arellano : this looks great!
I expect that more change is likely to be necessary, because I expect that codegenerators are going to need to consume other fields of a Target in order to do their work (for example: it's common to have some generator-specific config on the input Target). But this is a great incremental step.
class GeneratedSources: | ||
snapshot: Snapshot | ||
|
||
|
||
@rule | ||
async def hydrate_sources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule probably warrants a docstring that incorporates some of the description from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a lot of documentation now in the rule body and in the docstring of the relevant classes. I think adding docstring to the rule might be noisy.
src/python/pants/engine/target.py
Outdated
@@ -1444,5 +1536,6 @@ def rules(): | |||
find_valid_configurations, | |||
hydrate_sources, | |||
RootRule(TargetsToValidConfigurationsRequest), | |||
RootRule(GenerateSourcesRequest), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably not be a RootRule
: the conventions are nowhere near solid, but I think of them as being a bit like a public API, and in this case GenerateSourcesRequest
is an implementation detail of HydrateSourcesRequest
.
If a test wants to poke at those rules more directly, it can add a RootRule
itself to do so. But I think that we should think of the roots that we expose in rulesets as their external inputs for general use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, my understanding of RootRules is that you should use them whenever you directly inject a type into the graph, rather than deriving that type from other rules in the graph? Meaning, almost every Request
class should be a RootRule
because they are almost always directly created through a Python constructor, rather than being derived from some other rule.
Is this mental model the wrong way of understanding RootRule
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mental model the wrong way of understanding RootRule?
Yes, that is the wrong way to think about RootRule
. See https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/README.md#gets-and-rootrules
In short: a Param enters the graph either via a Get
or via a RootRule
. Things that enter as Get
s do not need to be declared as roots. This is where the "root" in the name comes from: you only need to declare something a RootRule
if it might come in at the "root" of a graph: ie, scheduler.product_request
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, huh. We're declaring way too many RootRules
then, I think, in part from bad advice I gave Benjy. I'll clean that up.
cc @benjyw we shouldn't be using RootRule
as much as I thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I expect that codegenerators are going to need to consume other fields of a Target in order to do their work
Benjy and I talked about this use case a couple of minutes ago. I realized this is indeed possible with the current design :) Every AsyncField
, like Sources
, has an address
property. This means that it's possible to do:
await Get[WrappedTarget](Address, sources_field.address)
src/python/pants/engine/target.py
Outdated
@@ -1444,5 +1536,6 @@ def rules(): | |||
find_valid_configurations, | |||
hydrate_sources, | |||
RootRule(TargetsToValidConfigurationsRequest), | |||
RootRule(GenerateSourcesRequest), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, my understanding of RootRules is that you should use them whenever you directly inject a type into the graph, rather than deriving that type from other rules in the graph? Meaning, almost every Request
class should be a RootRule
because they are almost always directly created through a Python constructor, rather than being derived from some other rule.
Is this mental model the wrong way of understanding RootRule
?
[ci skip-jvm-tests] [ci skip-rust-tests]
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] [ci skip-rust-tests]
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
[ci skip-rust-tests] [ci skip-jvm-tests] [ci skip-rust-tests] [ci skip-jvm-tests]
Stu and Benjy pointed out that it's common for a target to have fields on it that influence how the compiler runs. So, it will be common for the codegen rules to access the original target. While we could expect rules to call `await Get[WrappedTarget](Address)` directly in the rules, this makes life simpler for them. # Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
# Conflicts: # src/python/pants/engine/target.py [ci skip-rust-tests] [ci skip-jvm-tests]
We need this for run_setup_py.py # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
…pected (#9641) Soon, we will add codegen. With this, we need a way to signal which language should be generated, if any. @stuhood proposed in #9634 (comment) that we can extend this setting to indicate more generally which Sources fields are valid, e.g. that we expect to work with `PythonSources` and `FilesSources` (or subclasses), but nothing else. All invalid fields would return an empty snapshot and indicate via a new `HydratedSources.output_type` field that it was an invalid sources field. This means that call sites can still pre-filter sources fields like they typically do via `tgt.has_field()` (and configurations), but they can also use this new sugar. If they want to use codegen in the upcoming PR, they must use this new mechanism. Further, when getting back the `HydratedSources`, call sites can switch on the type. Previously, they could do this by zipping the original `Sources` with the resulting `HydratedSources`, but this won't work once we have codegen, as the original `Sources` will be, for example, `ThriftSources`. ```python if hydrated_sources.output_type == PythonSources: ... elif hydrated_sources.output_type == FilesSources: ... ``` [ci skip-rust-tests] [ci skip-jvm-tests]
…rate-sources [ci skip-jvm-tests] [ci skip-rust-tests]
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
Less of a misnomer. This is not the actual output. It's rather a description of the sources. This also fits in better with `HydrateSourcesRequest.for_sources_types`. # Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now ready for re-review.
f"Multiple of the registered code generators can generate {output} from {input}. " | ||
"It is ambiguous which implementation to use.\n\nPossible implementations:" | ||
f"{bulleted_list_sep}{bulleted_list_sep.join(possible_generators)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads:
Multiple of the registered code generators can generate FortranSources from AvroSources. It is ambiguous which implementation to use.
Possible implementations:
* FortranGenerator1
* FortranGenerator2
super().__init__( | ||
f"Multiple of the registered code generators can generate one of " | ||
f"{possible_output_types} from {input}. It is ambiguous which implementation to " | ||
f"use. This can happen when the call site requests too many different output types " | ||
f"from the same original protocol sources.\n\nPossible implementations with their " | ||
f"output type: {bulleted_list_sep}" | ||
f"{bulleted_list_sep.join(possible_generators_with_output)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads:
Multiple of the registered code generators can generate one of ['FortranSources', 'SmalltalkSources'] from AvroSources. It is ambiguous which implementation to use. This can happen when the call site requests too many different output types from the same original protocol sources.
Possible implementations with their output type:
* FortranGenerator1 -> FortranSources
* SmalltalkGenerator -> SmalltalkSources
src/python/pants/engine/target.py
Outdated
@final | ||
@classmethod | ||
def can_generate(cls, output_type: Type["Sources"], union_membership: UnionMembership) -> bool: | ||
"""Can this Sources field be used to generate the output_type?""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this method is important. While we generally expect people to rely on HydratedSourcesRequest.for_sources_types
to know if codegen is possible, there are some cases where we need to pre-filter without hydrating the field. For example, run_setup_py.py
needs this.
This method allows us to fully support the pre-filtering idiom that we normally use with the Target API via tgt.has_field()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring here could maybe indicate that most callers won't need to call this, and recommend HydratedSourcesRequest
.
class GeneratedSources: | ||
snapshot: Snapshot | ||
|
||
|
||
@rule | ||
async def hydrate_sources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a lot of documentation now in the rule body and in the docstring of the relevant classes. I think adding docstring to the rule might be noisy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot Eric! This looks awesome.
src/python/pants/engine/target.py
Outdated
@final | ||
@classmethod | ||
def can_generate(cls, output_type: Type["Sources"], union_membership: UnionMembership) -> bool: | ||
"""Can this Sources field be used to generate the output_type?""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring here could maybe indicate that most callers won't need to call this, and recommend HydratedSourcesRequest
.
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
This creates a generic `protobuf_library()` to be used across all languages. From this, if `pants.backend.codegen.protobuf.python` is activated, then V2 call sites which signal they want to use codegen (via #9634) will automatically convert any `ProtobufSources` into generated Python sources. ### Runtime Protobuf dependency To function properly, generated Protobuf Python files must depend on the `protobuf` Python wheel. Normally, in V1, we would inject this dependency dynamically. For now, the user must explicitly specify the dependency in the BUILD file like they do for any normal Python requirement. ### No `gen` goal Users do not have a direct way to inspect the generated files. For now, they would need to have a `python_binary()` depend on the `protobuf_library`s, then inspect the built PEX after `./v2 binary`. ### How we handle source roots Protoc needs to generate files with the source roots stripped. `src/protobuf/example/f.proto` becomes `example/f_pb2.py`. But, Protoc won't naively understand the input file `src/protobuf/example/f.proto` because it doesn't know how to naively handle source roots. In V1, we used the flag `--proto_path` to teach Protoc how to understand source roots. Instead, here, we take the approach of V2 Python of stripping off the source roots for these reasons: 1) We have lots of utility rules for stripping source roots already. 2) Those utilities have important performance optimizations. * Specifically, we only inspect 1 `sources` file for the target to determine the source root, rather than naively inspecting every `sources` file. 3) Parity with v2 Python implementation. [ci skip-rust-tests] [ci skip-jvm-tests]
Goals of design
See https://docs.google.com/document/d/1tJ1SL3URSXUWlrN-GJ1fA1M4jm8zqcaodBWghBWrRWM/edit?ts=5ea310fd for more info.
tl;dr:
avro_library
. This means that call sites must declare which language should be generated from that protocol../pants filedeps
.Implementation
Normally, to hydrate sources, we call
await Get[HydratedSources](HydrateSourcesRequest(my_sources_field))
. We always use the exact same rule to do this because allsources
fields are hydrated identically.Here, each codegen rule is unique. So, we need to use unions. This means that we also need a uniform product for each codegen rule for the union to work properly. This leads to:
Each
GenerateSourcesRequest
subclass gets registered as a union rule. This achieves goal #4 of extensibility.--
To still work with subclassing of fields (goal #3), each
GenerateSourcesRequest
declares the input type and output type, which then allows us to useisinstance()
to accommodate subclasses:--
To achieve goals #1 and #2 of allowing call sites to declaratively either get the original protocol sources or generated sources, we hook up codegen to the
hydrate_sources
rule andHydrateSourcesRequest
type:[ci skip-rust-tests]
[ci skip-jvm-tests]