Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genesis Protobuf Migration #5917

Closed
15 of 16 tasks
aaronc opened this issue Apr 2, 2020 · 17 comments · Fixed by #7000
Closed
15 of 16 tasks

Genesis Protobuf Migration #5917

aaronc opened this issue Apr 2, 2020 · 17 comments · Fixed by #7000
Assignees
Labels
C:Encoding C:genesis relating to chain genesis

Comments

@aaronc
Copy link
Member

aaronc commented Apr 2, 2020

Context

  • with protobuf, we now have a new JSON format which we want to support
  • streaming genesis is desirable because of large imports and exports from mature zones
  • with the migration to Any, the same structs can support both amino and protobuf JSON

Roadmap

/cc @alexanderbez @clevinson

@alpe
Copy link
Contributor

alpe commented Apr 7, 2020

GenTX decoding from genesis is very critical as well as it is required to run a node. DeliverGenTxs is still amino.

@clevinson clevinson added this to the v0.39 milestone Apr 30, 2020
@clevinson
Copy link
Contributor

@fedekunze fedekunze added C:Encoding C:genesis relating to chain genesis labels Apr 30, 2020
@tac0turtle
Copy link
Member

will part of this also depend on tendermint genesis work or is it orthogonal? I am starting to delve into the tendermint side of JSON encoding

@aaronc aaronc mentioned this issue May 15, 2020
11 tasks
@aaronc
Copy link
Member Author

aaronc commented May 15, 2020

will part of this also depend on tendermint genesis work or is it orthogonal? I am starting to delve into the tendermint side of JSON encoding

@marbar3778 Can you share some details/link about this?

@aaronc
Copy link
Member Author

aaronc commented May 15, 2020

With #6147, we now have an easy pathway to support both Amino and Proto JSON, so I've updated this issue to include a task in this issue for adding flags to start and export to choose which JSON format to use.

@aaronc
Copy link
Member Author

aaronc commented May 15, 2020

Also I've thought about how to approach streaming JSON based on some things @alpe has shared and just want to document the approach I have in mind. (I don't think we need to get to this quite yet.)

Here are the constraints I see:

  • streaming JSON should be opt-in on a module by module basis and isn't blocking for v0.39 (unless there are resource issues doing the hub export)
  • looking at our current genesis code, the order of operations is pretty important
  • in order to support streaming properly, we need to be able to choose which elements in an object get parsed first

I think we can solve this with the following approach:

  • allow either a genesis.json file as now, or a genesis/ folder
  • arrays which are large and intended to be streamed can like in their own JSON files, i.e. staking.delegations could live in genesis/staking/delegations.json
  • root.json in the genesis/ folder contains all definitions that are not in the separate files. Anything which is in a separate file should be omitted from root.json

The API for this might look something like:

type ObjectReader interface {
    // reads the named field from the object into the ptr
    ReadObject(name string, ptr interface{}) error
    // opens a stream reader for the array at the named field
    // this will open a separate file from the disk if it exists to support massive arrays
    // (i.e. `delegations.json` for the field `delegations`)
    StreamArray(name string) ArrayStreamer
}

type ArrayStreamer interface {
    ReadNext(ptr interface{}) error
    HaveMore() bool
}

Using gogo jsonpb, there is already an UnmarshalNext method we can leverage: https://godoc.org/github.com/gogo/protobuf/jsonpb#Unmarshaler.UnmarshalNext

@tac0turtle
Copy link
Member

tac0turtle commented May 15, 2020

Can you share some details/link about this?

we are discussing and started a small doc here: https://docs.google.com/document/d/1RzF9qvr-a6oYbmKdhHJ_lYLIUbxozTFfsJCwJzAqXkY/edit#heading=h.n4jqb5mvv9mq

and have a issue here: tendermint/tendermint#4828

cc @aaronc

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 5, 2020
@aaronc aaronc added pinned and removed stale labels Jul 6, 2020
@fedekunze
Copy link
Collaborator

should we get started with the .proto files? or is it blocked by something else?

@aaronc
Copy link
Member Author

aaronc commented Jul 8, 2020

There is still a question in my mind of whether we need streaming genesis for stargate. For ibc it’s likely not needed yet (because there isn’t a huge amount of live data on the hub). So migrating the current GenesisState to proto files is probably okay. However, the streaming genesis is l believe the future proof way to go for large import/exports. This is likely something we should discuss in our SDK call Friday.

@aaronc
Copy link
Member Author

aaronc commented Jul 9, 2020

@anilcse and @sahith-narahari are going to export data from Cosmos Hub 3 so that we have a sense of how large the genesis import/export is. Then we can assess the best path forward.

@dokwon
Copy link

dokwon commented Jul 10, 2020

What are the performance benefits from streaming JSON?

We've explored storing genesis state as a binary (we have 1.7M accounts and state is getting bloated) and only present genesis.json for readability purposes.

@yun-yeo
Copy link
Contributor

yun-yeo commented Jul 10, 2020

Terra will release cosmwasm before Stargate, so the state size is going much lager. In this aspect, we need streaming feature to avoid out of memory situation.

As @dokwon mentioned, It can be level db store like binary format.

@anilcse
Copy link
Collaborator

anilcse commented Jul 15, 2020

@anilcse and @sahith-narahari are going to export data from Cosmos Hub 3 so that we have a sense of how large the genesis import/export is. Then we can assess the best path forward.

Hub's exported state is ~52M in size, we might not need streaming genesis for stargate. We can address that later may be.

@aaronc
Copy link
Member Author

aaronc commented Jul 15, 2020

Thanks @anilcse.

So sounds like it's more of an issue for Terra possibly. @YunSuk-Yeo do you know the actual size of the exported Terra genesis.json file?

@dokwon
Copy link

dokwon commented Jul 16, 2020

@aaronc Our exported genstate for Columbus-3 (December last year) was around 500M. Assuming filesize growth is linear to account number growth that would imply a Columbus-4 genstate fillesize of > 1GB. The first block took a good 2 minutes for the fastest validators to process, and a lot longer for most others.

While I understand that the Hub would not have similar issues, think the SDK development should reflect needs of client chains that see sizable use.

@aaronc
Copy link
Member Author

aaronc commented Aug 4, 2020

@dokwon @YunSuk-Yeo unfortunately we won't have the time to complete streaming genesis for the 0.40 release - there is just too much on our backlog. I have, however, created an issue specifically for this (#6936) and tagged it for the 0.41 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Encoding C:genesis relating to chain genesis
Projects
None yet
Development

Successfully merging a pull request may close this issue.