Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR-051 Arbitrary Protobuf IPLD Support Scheme #11186

Closed
wants to merge 2 commits into from

Conversation

i-norden
Copy link
Contributor

@i-norden i-norden commented Feb 14, 2022

This is the initial draft of ADR-051 Arbitrary Protobuf IPLD Support Scheme. This proposal is an extension of the existing Tendermint/Cosmos IPLD work (e.g. https://github.com/vulcanize/go-codec-dagcosmos). This proposal would standardize an approach for the generic IPLD support of arbitrary protobuf types stored in state.

This proposal does not propose any implementation within the SDK directly, it only proposes an abstract model for representing Cosmos state as IPLD. The IPLD codecs proposed here would be implemented and tested in an external repository (the one linked above). A subsequent ADR will propose IPLD middleware that would introduce/implement features in the SDK that enable the leveraging of this data model when doing state streaming.

Some pending questions that require some further exploration include:

  1. Is it actually better to use a self describing message vs referencing .proto files?
  2. What additional tooling is required if we use self-describing messages vs referencing .proto files directly (some of it is already mentioned herein, but may be missing something)?
  3. In both cases, what is the best way of handling proto types that have a lot of dependencies? An IPLD compatible protobuf compiler is proposed here but would be a significant undertaking.

For 1 we are leaning towards the self-describing messages (which is why they are proposed in the draft) and this is in part because it provides a simple way to canonicalize the proto definitions whereas .proto files can differ in formatting/whitespace. These differences in formatting/whitespace would cause the, otherwise, same content to produce different IPLD blocks and hash references and muddy up the IPLD DAG representation of protobuf dependency trees.

Renamed to ADR-051, I missed the existing draft PR for an ADR-050.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change (NA)
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification (NA)
  • followed the guidelines for building modules (NA)
  • included the necessary unit and integration tests (NA)
  • added a changelog entry to CHANGELOG.md (NA)
  • included comments for documenting Go code (NA)
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@github-actions github-actions bot added the T: ADR An issue or PR relating to an architectural decision record label Feb 14, 2022
@i-norden i-norden changed the title docs: ADR-50 Arbitrary Protobuf IPLD Support Scheme docs: ADR-050 Arbitrary Protobuf IPLD Support Scheme Feb 14, 2022
@i-norden i-norden force-pushed the adr_050 branch 2 times, most recently from f596718 to 871c99c Compare February 16, 2022 04:36
@i-norden i-norden changed the title docs: ADR-050 Arbitrary Protobuf IPLD Support Scheme docs: ADR-051 Arbitrary Protobuf IPLD Support Scheme Mar 1, 2022
Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick read and left two comments.

Im not sure this should be added into the sdk. I haven't talked to a team in the cosmos asking for IPLD. It would be nice to hear from a wider range of teams if this needed.

types to and from the binary format that is persisted to disk. As such, an indefinite/unbounded number of protobuf types
need to be supported within the Cosmos ecosystem at large.

Rather than needing to register new content types and implement custom codecs for every Cosmos protobuf type we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to do this now as well. This is already possible with grpc reflection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @marbar3778 , closed this PR as requested but if you get a chance to read through my response to this comment (#11186 (comment)) let me know what you think. If I am mistaken and I missed some grpc reflection feature that can fully substitute for what is proposed here it would be great to know! Thanks!

wish to support as IPLD it would be useful to have a generic means of supporting arbitrary protobuf types. This would
open the doors for some interesting features and tools.
For example: a universal (and richly typing) IPLD block explorer for all/any blockchains in the Cosmos ecosystem that
doesn't require custom integrations for every type it explores and represents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be possible with grpc reflection.

@i-norden
Copy link
Contributor Author

i-norden commented Mar 15, 2022

Hey @marbar3778 , thanks for the review.

If this doesn’t belong in the SDK that's understandable, although it would be useful to canonicalize it somewhere official. Perhaps another repo in the Cosmos org?

When you say gRPC reflection I think of the server reflection protocol, are you referring to the mechanism underpinning argument reflection there? If so, it’s a very similar mechanism as here. It relies on exporting FileDescriptorProtos for the server method argument types from the server’s DescriptorDatabase. So it depends on the types being known by the server ahead of time. Which ofc isn't a problem for a gRPC server as the types are necessarily compiled into the runtime in order to use and expose them in the first place.

The problem for us remains that given raw protobuf message bytes without any additional context then it is (afaik) very difficult to extract useful information from those bytes let alone reconstitute the type in full. The way we propose to handle this in the context of IPLD is to prefix the message bytes with a 32 byte content-hash reference to the set of FileDewscriptorProtos (itself an IPLD) that describes the message's type.

The goal here is to be able to recover the type of some arbitrary protobuf we come across while traversing the Tendermint/Cosmos IPLD DAG from within the context of that DAG without having to reach out to some external DescriptorDatabase or other registry or use pre-known .proto definitions.

Additionally, in the case of gRPC server reflection protocol the client needs to know a .proto file name or a full symbol name in order to request the corresponding descriptor(s) from the server. It can first retrieve this information by requesting a list of all the methods the server supports (although, apparently, there is no obligation in the protocol to return a full list). But in our context of considering only the raw message binary we don't even have this information available, so even if we did have access to some external DescriptorDatabase or registry we wouldn't know what descriptors or .proto files to ask for.

Comment on lines +22 to +25
The SDK stores values in state storage in a protobuf encoded format. Each module defines their own custom types in
.proto definitions and these types are registered with a ProtoCodec that manages the marshalling and unmarshalling of these
types to and from the binary format that is persisted to disk. As such, an indefinite/unbounded number of protobuf types
need to be supported within the Cosmos ecosystem at large.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mentioned here but is this true? Modules get access to e.g. KVStore API which deals in raw []byte which AFAIK don't require protobuf encoding. Are you referring to something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question/feedback! That is correct, nowhere in the SDK is using protobuf encoding of KVStore values strictly enforced/required. But proto and amino (legacy support) are the only codecs supported in https://github.com/cosmos/cosmos-sdk/blob/master/codec/codec.go and, in general, the goal is to move all encoding to protobuf: https://github.com/cosmos/cosmos-sdk/blob/master/docs/core/encoding.md. So, it is not a strict requirement but it is loosely prescribed/standardized.

In any case, it has been decided that this ADR is not appropriate for the SDK so I'm going to close this PR. Instead I will be finalizing the proposal here as part of the ipld schemas for tendermint/cosmos: ipld/ipld#111.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. But I guess as long as KVStore interface provides Get/Set that operate on []byte there's no way to assert an encoding 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: ADR An issue or PR relating to an architectural decision record
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants