Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization Concerns #197

Closed
ebuchman opened this issue Mar 29, 2020 · 15 comments
Closed

Serialization Concerns #197

ebuchman opened this issue Mar 29, 2020 · 15 comments
Assignees
Milestone

Comments

@ebuchman
Copy link
Member

Serialization issues have been a constant plague upon us.

At a high level, virtually all our data structures must serialize to/from both Amino and JSON.

The main data structures target JSON, while copies of the data structures in the amino_types module target Amino.

A lot of the difficulties come from peculiarities in Tendermint JSON encoding, especially around empty/missing/nil values. In particular:

  • Sometimes Tendermint JSON uses null
  • Sometimes it uses default values
  • Sometimes it omits values completely (omitempty)

In certain cases, multiple fields in a struct are all present or all absent together. We should consider making such types enums with Present and Absent variants - the Present variant requires all the fields (except the genuinely Optional ones), and the Absent variant omits all the fields that are omitted together. This is for instance the case with the Block struct - if it's the first block, all the data pertaining to previous blocks is omitted, but if its any other block, it all must be there #101 .

We don't seem to currently have a coherent/standard way to deal with all this complexity. In some cases we use Option in structs. In some cases we use custom serialize/deserialize functions. In some cases we use custom types that implement their own serialize/deserialize methods. It would be worth attempting some kind of inventory of all this special casing and try to achieve more consistency and simplification.

While we are aiming for compliance with Tendermint Core v0.33, there may be cases where its worth recommending a change to Tendermint Core to make this more manageable.

Related:

@liamsi
Copy link
Member

liamsi commented Mar 29, 2020

Serialization issues have been a constant plague upon us.

So true, independent of this repo 😆

@tarcieri
Copy link
Contributor

A common technique I've seen elsewhere for dealing with encoding/decoding of complex types (for example, in Cargo) is separating the "domain" types (i.e. the ideal ones you want to use) from the ones which handle encoding, then having From/TryFrom conversions between the two sets of types.

@liamsi
Copy link
Member

liamsi commented Apr 6, 2020

In this context: instead of having currently two versions of a type (one for JSON another for amino/binary encoding), we could merge these two into one.

Pros:

  • easier to maintain (if sth changes only one serialzation type needs to be adapted and not two)
  • no serde/amino/encoding annotation in "ideal" type used for any core logic
  • less duplication

Cons:

  • not as clear separation of domains (one type/module handles JSON and binary encoding) and related to this:
  • Froms / Intos might be different depending on the underlying encoding (?)
  • type annotation might make the serialization type more difficult to read (?)
  • might be not feasible if we switched to using generated code from proto files (protobuf mapping might be different from what we need: https://developers.google.com/protocol-buffers/docs/proto3#json (?))

I know we discussed this a while ago but I can't really remember if we were in favour of "merging" serialization types or not.

@ebuchman
Copy link
Member Author

ebuchman commented Apr 7, 2020

I know we discussed this a while ago but I can't really remember if we were in favour of "merging" serialization types or not.

I think we were against it for now given the idiosyncracies of the two formats. If we manage to consolidate everything to a standard proto/proto-JSON format for which there is already robust library support, then maybe we can revisit consolidating.

Another note, in my understanding, a single Rust struct cannot be encoded two different ways depending on the encoding. I could be wrong but I thought it mapped down to a common intermediate representation so if you have strings encoded one way for proto and another way for JSON you have no choice but to use distinct types ...

@liamsi
Copy link
Member

liamsi commented Apr 7, 2020

in my understanding, a single Rust struct cannot be encoded two different ways depending on the encoding

Well if both encodings implement Serialize, Deserialize that is true. prost and or amino generate different methods (basically for each struct Foo prost generates Foo_MESSAGE which implements the prost::Message trait (which does have a decode / encode method).

A quick try confirms that this at least compiles:
#[derive(Clone, PartialEq, Message, Serialize, Deserialize)]

I'm not in favour of consolidating the different structs though.

@ancazamfir
Copy link
Contributor

For IBC, as we move to proto3 for all queries and messages, I would be strongly in favor of using a single types/ structs derived from the .proto and making sure the canonical encodings in JSON (as per https://developers.google.com/protocol-buffers/docs/proto3#json) are accepted by both Rust and Go impl.
But I couldn't find any rust crate that could help. Prost doesn't support this.
protobuf crate had JSON support in 2.9.0 (https://docs.rs/protobuf/2.9.0/protobuf/json/index.html) but was recently removed due to some backwards compatibility issues.

This is of course supported in Go.

@liamsi
Copy link
Member

liamsi commented Apr 10, 2020

@ebuchman brought up that we don't yet have a full story for JSON decoding. It is clear that we can and already use serde for JSON. But there is also the amino style type/value-style encoding for registered types (see this snippet for an example). The closest to this is the Any encoding in proto3 JSON mappings but it has minor differences (e.g. the @ in type).

It should be really easy (no really) to write a decoder/encoder for these though which we can use for all cases (without code duplication). I will draft a little parametrizable struct.

@marbar3778 was the plan to replace the amino JSON with vanilla proto JSON mapping? I only familiar on how the binary encoding changes with look like but I didn't look into how JSON will be handled.

@liamsi
Copy link
Member

liamsi commented Apr 11, 2020

@ebuchman in any case we could use sth like this:

// TODO: deserves a better name
#[derive(Serialize, Deserialize, Clone, PartialEq, Debug)]
struct AminoJSON<T: Serialize + DeserializeOwned> {
#[serde(alias = "type")]
pub type_name: Option<String>,
#[serde(bound(deserialize = "Option<T>: Deserialize<'de>"))]
pub value: Option<T>,
}

Or without Options:

#[derive(Serialize, Deserialize, Clone, PartialEq, Debug)]
struct AminoJSON<T: Serialize + DeserializeOwned> {
#[serde(alias = "type")]
pub type_name: String,
#[serde(bound(deserialize = ""))]
pub value: T,
}

Actually much simpler and more idiomatic would be using serde's enum representations: e.g., https://serde.rs/enum-representations.html#adjacently-tagged

#[derive(Serialize, Deserialize, Clone, PartialEq, Debug)]
#[serde(tag = "type", content = "value")]
enum SomeRegisteredInterface {
#[serde(alias = "cosmos-sdk/MsgUnjail")]
Unjail(MsgUnjail),
#[serde(alias = "Foo")]
OtherMadeUp(MsgOtherMadeUp),
}

@liamsi
Copy link
Member

liamsi commented May 4, 2020

Assigning greg here as he is already working on serialization improvements (#248 ).

@liamsi
Copy link
Member

liamsi commented May 13, 2020

Note that in tendermint amino JSON encoding will go full "proto mapping" encoding

tendermint issue: tendermint/tendermint#4828

This might be relevant to some of the work done in #247 too.

cc @marbar3778

@liamsi
Copy link
Member

liamsi commented May 13, 2020

Particularly relevant for the current JSON work: tendermint/tendermint#4828 (comment)

@ebuchman
Copy link
Member Author

I think we've been making progress on this with the recent upgrade to v0.33 and the upcoming upgrade to v0.34 of Tendermint. Maybe we can close this? @greg-szabo ?

@greg-szabo
Copy link
Member

Let's finish the JSON part of 0.34 (sometime next week) and then we can close this. I just want to make sure. ;)

@ebuchman ebuchman added this to the v0.17.0 milestone Sep 11, 2020
@ebuchman
Copy link
Member Author

@greg-szabo should we close this now? Note I opened #567 to track brining back JSON tests. There might be more here but we have a pretty comprehensive domain type system and can follow up with more concrete things as we need them.

@ebuchman
Copy link
Member Author

Lets close this for #567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants