-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate FileDescriptorProto accessors #55
Comments
The combination does get a bit messy from an API perspective. Could you generate a descriptor file with protoc --descriptor_set_out=.desc and load it as a resource at runtime?
|
To give a bit more detail... my use case is the following:
So the problem is how to publish the descriptor data from the applications over the network (using the API provided by the base library) such that it is dynamically introspectable by tools. We could have the build system for each application generate a file descriptor set for all of the descriptors for all of the libraries it uses (and anything the application itself has) and then publish that to the network to a unique location per application. It can't only publish the descriptors it actually uses because there's no way to get that information at build time or at runtime (with QB). This will result in duplication (as application 1 and application 2 will both publish file descriptor sets that contain base library descriptors, for example), and I'm not sure exactly how big the complete descriptor set is going to get (and this size is of course multiplied by the number of applications running). Tools will need to maintain separate descriptor databases for each application and figure out which descriptor to use from which database for a given type string, but that's relatively easy to do. If each generated file provides access to its filename, file descriptor proto, and dependencies, we can walk this tree at runtime to either individually publish file descriptors or build/publish a file descriptor set (although that's maybe not possible given just a byte[] for a file descriptor)--this time with only the file descriptors that are actually used at runtime by that particular application. My original thought was to do common publishing of file descriptors (uniquely indexed only by file name), but the downside of this is different applications could publish different versions of the file and conflict with each other in a way that's not discoverable by tools (for debugging purposes--the tools don't actually know which application published a particular typed value, so have to pick one), so it may be better to also make this publish unique per application (effectively publishing a per-application file descriptor set), so the main thing that's being gained with this approach is the avoidance of duplication of all the (mostly unused--most applications will only use a tiny subset) library file descriptors. It also makes it substantially less likely that conflicts will arise between the file descriptors, because only the files actually being used by an application are getting published by that application. |
Thinking about this more, I had an idea--generate the FileDescriptorSet at build time, load it at runtime, and also manipulate it at runtime by extracting only the FileDescriptorProto's we care about. To avoid pulling in the entire google upstream descriptors, I could have "lightweight" versions of those protobufs (e.g. my own version with only some of the fields defined), and I think as long as store_unknown_fields=true, I can use QB to parse the "lightweight" FileDescriptorSet/FileDescriptorProto and output them intact either individually or as a new FileDescriptorSet? I still need to think through how the generation process is going to work in the multiple-library-and-user-builds scenario, and whether for that reason it might still be beneficial to embed in the generated code instead. |
Sorry, I forgot to reply earlier. I was thinking of generating The FileDescriptor could potentially also provide a List or Map of all identifiers. Creating a reduced version of the Google Descriptors w/ storing unknown fields should work as well. |
Fyi, I just finished a large project I was working on. I need to do a few high-priority smaller items, but I should hopefully be able to get to this early next week. |
Any update on this? |
Sorry, it got delayed a lot and I'm still a bit confused about the requirements. From what I saw in the protobuf-java API I think you are looking for equivalents for the two methods: String fullName = MyProtoType.getDescriptor().getFullName();
byte[] fileDescriptor = MyProtoType.getDescriptor().getFile().toProto().toByteArray(); but I wonder how that works with protos that are defined in other files? I saw a |
What I'm currently doing in C++ is the following. Note I'm not actually publishing a FileDescriptorSet as such, I'm instead publishing (via the callback function fn) the FileDescriptor of each file for the whole dependency tree of file descriptors, starting from the proto's file descriptor. static void ForEachProtobufDescriptorImpl(
const FileDescriptor* desc,
function_ref<bool(std::string_view typeString)> wants,
function_ref<void(std::string_view typeString,
std::span<const uint8_t> schema)>
fn,
Arena* arena) {
if (!wants(desc->name())) {
return;
}
for (int i = 0, ndep = desc->dependency_count(); i < ndep; ++i) {
ForEachProtobufDescriptorImpl(desc->dependency(i), wants, fn, arena);
}
FileDescriptorProto* descproto = Arena::CreateMessage<FileDescriptorProto>(arena);
descproto->Clear();
desc->CopyTo(descproto);
std::vector<uint8_t> buf;
detail::SerializeProtobuf(buf, *descproto);
delete descproto;
fn(fmt::format("proto:{}", desc->name()), buf);
}
void detail::ForEachProtobufSchema(
const google::protobuf::Message& msg,
function_ref<bool(std::string_view filename)> wants,
function_ref<void(std::string_view filename,
std::span<const uint8_t> descriptor)>
fn) {
ForEachProtobufDescriptorImpl(msg.GetDescriptor()->file(), wants, fn,
msg.GetArena());
} |
Thanks. Would an API like below (reduced from protobuf-java) work? class SomeGeneratedMessage extends ProtoMessage {
public static Descriptor getDescriptor();
}
interface Descriptor {
FileDescriptor getFile();
String getName();
String getFullName();
byte[] toProtoBytes();
}
interface FileDescriptor {
String getName();
String getFullName();
String getPackage();
byte[] toProtoBytes();
List<FileDescriptor> getDependencies();
} |
That looks great! |
I implemented an initial version of the API above. The generated code currently looks like this: https://gist.github.com/ennerf/222e68f6b6ac5fb2600c58ec35804457 with each message generating a public static class MessageSetCorrectExtension2 {
// ...
public static Descriptors.Descriptor getDescriptor() {
return AllTypesOuterClass.internal_static_quickbuf_unittest_TestAllTypes_MessageSetCorrectExtension2_descriptor;
}
// ...
} |
I also added I removed the gpg requirement, so you should be able to run |
Thanks! I'll try it out this weekend. |
Nevermind, figured it out. |
Sorry, I just got back from a conference. Do you have any more suggestions or is the PR state good as is? |
It’s good as is for what I need. Thanks! |
Thanks for verifying. I'll get a release out soon. |
version 1.3.2 is on maven central |
Follow-up to #47. It turns out that all of the standard protobuf implementations for reflection operate at the file level (FileDescriptorProto) rather than the individual protobuf level. This has propagated into other file formats, e.g. MCAP stores protobuf schemas as FileDescriptorSets (a set of FileDescriptorProtos). This would involve generating, at a minimum, a
public static byte[] getFileDescriptorProto()
at the generated file level.The reason I say "at a minimum" is because this alone does not provide direct visibility to the .proto file's dependencies, and exporting a complete set thus requires external (non-generated) information or parsing of the FileDescriptorProto. The dependencies are visible through the FileDescriptor's "dependency" repeated string, so could be exposed that way (as a generated
String[]
getter), or maybe something more Java'y (e.g. aClass<?>[]
, although that has some potential issues and may be more annoying than useful).The text was updated successfully, but these errors were encountered: