-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
teach codegen rules to accomodate compiled languages #14258
Comments
Using Scala and Protobuf as an example, one solution could leverage target generation as follows:
More open questions:
|
The choice of how to specify what backends to run impacts #14041. |
For the purposes of the JVM at least, I think that this can be accomplished by making pants/src/python/pants/jvm/compile.py Lines 76 to 88 in 4494749
To do that, I think that that method should call into some non-rule code extracted from the existing This would allow for inner nodes in the graph (i.e., things which are depended on by non-codegen'd sources) to be codegen'd, which is 95% of what we need. The last 5% would be allowing codegen targets to be roots in, say, @Eric-Arellano : Does the above sound right? |
…release (#14351) As described in #14258, codegen backends for compiled languages do not currently work since the core codegen rules only supporting generating sources, but not generating targets to allow the backends for compiled languages to actually compile the sources. For now, remove the registration code for those backends so users do not try to use them.
As explained below, I propose we model codegen for compiled languages by introducing target types for each language and codegen technology; for example, Let's consider alternate solutions:
Proposed solution:
|
I'm not quite convinced of the need for per-language target types, as plugin fields have already been demonstrated to work for At a fundamental level, the target API is "field" centric... the extra target type to encapsulate slightly different fields doesn't seem like it adds much here. Additionally, we had a "multiple target types" implementation in v1, and the impact of that was a proliferation of macros to avoid the boilerplate involved in having X copies of a target. At Twitter it was
Adding a "generator" field should only actually be necessary to remove ambiguity in the case where multiple conflicting instances are installed. I expect that if/when we need to add a field like this, the solution can be lazer focused within this method: pants/src/python/pants/engine/internals/graph.py Lines 904 to 907 in 1f314a2
When a
This is the fundamental issue that needs to be resolved by this ticket, so it should probably have more than a footnote. It's not clear how having per-language targets makes this any easier (the API is field centric rather than target-type centric, after all). |
Yes and is actually one of the fundamental points that I am still stuck on. Even if |
In practice, codegenerated code should never actually need dependency inference, because it cannot depend on non-thirdparty code which hasn't itself been generated. So it should always (?) be possible to declare the dependencies of one of these targets before generation. |
For Go, some form of analysis will be necessary (whether that is the existing "dependency inference" rules is open still for this design, but running our Go package analyzer will be necessary regardless). The generated code contains import statements to specific third-party runtime libraries and (this is the annoying part) to other generated code included in the file being generated. See https://github.com/golang/protobuf/blob/5d5e8c018a13017f9d5b8bf4fad64aaa42a87308/protoc-gen-go/generator/generator.go#L1268. Although for the latter, those dependencies should be just the "generated" form of the dependency already inferred for the protobuf source file. |
A note for the future:
This assumes a 3rdparty code generator I think. A bespoke in-tree code-generator can certainly depend on 1st party code for a runtime support library. I think one of the motivating cases that stressed the v0 / v1 was 4sq's bespoke in-tree code generator of which I cannot recall the name. |
Yep, makes sense. Discussed with @tdyas a bit offline, but: the core |
Add support for generating Go from protobuf sources using the [protoc-gen-go](https://pkg.go.dev/google.golang.org/protobuf/cmd/protoc-gen-go) and [protoc-gen-go-grpc](https://pkg.go.dev/google.golang.org/grpc/cmd/protoc-gen-go-grpc) plugins to `protoc`. This is not actually wired up yet to Go because we need to solve #14258, but it does give us the technology to generate the `.go` files. Note that this adds a new technique for us to install a Go tool deterministically. For now, the `go.mod` and `go.sum` are hardcoded, but we can choose to expose this through the options system in the future if need be. [ci skip-rust]
Closes #14258. As described there, codegen for compiled languages is more complex because the generated code must be _compiled_, unlike Python where the code only needs to be present. We still use the `GenerateSourcesRequest` plugin hook to generate the raw `.go` files and so that integrations like `export-codegen` goal still work. But that alone is not powerful enough to know how to compile the Go code. So, we add a new Go-specific plugin hook. Plugin implementations return back the standardized type `BuildGoPackageRequest`, which is all the information needed to compile a particular package, including by compiling its transitive dependencies. That allows for complex codegen modeling such as Protobuf needing the Protobuf third-party Go package compiled first, or Protobufs depending on other Protobufs. Rule authors can then directly tell Pants to compile that codegen (#14705), or it can be loaded via a `dependency` on a normal `go_package`. [ci skip-rust]
Problem
The existing codegen rules do not work for compiled languages because those core rules treat generated sources as loose files and provide no way to wrap that generated code in the target types needed by the backends for compiled languages to actually build that code.
This is not a problem for Python (and other scripting languages) because mere existence of the generated sources is enough for the Python interpreter to consume those files. Contrast that with the Go backend which needs to build those sources into a package archive that is then linked into Go binaries. The Go backend tracks each package by using the
go_package
andgo_third_party_package
target types. (Thego_third_party_package
targets are generated automatically by processing the applicablego.mod
file.) Loose source files cannot be processed by the Go backend.The same applies for the Java and Scala backends.
Potential Solutions
Pants will need either a modification of the core codegen rules or a new codegen API to support compiled languages. The new API could use target generation to generate language-specific target types to wrap generated code. Further discussion in the comments.
The text was updated successfully, but these errors were encountered: