Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Genesis #6936

Closed
aaronc opened this issue Aug 4, 2020 · 5 comments
Closed

Streaming Genesis #6936

aaronc opened this issue Aug 4, 2020 · 5 comments
Labels
C:genesis relating to chain genesis

Comments

@aaronc
Copy link
Member

aaronc commented Aug 4, 2020

Streaming genesis is useful when app state is large. Projects such as Terra have expressed concern around memory usage. It was not tackled in #5917 due to time constraints.

/cc @YunSuk-Yeo @dokwon


Here is my previous proposal from #5917 :

Also I've thought about how to approach streaming JSON based on some things @alpe has shared and just want to document the approach I have in mind. (I don't think we need to get to this quite yet.)

Here are the constraints I see:

  • streaming JSON should be opt-in on a module by module basis and isn't blocking for v0.39 (unless there are resource issues doing the hub export)
  • looking at our current genesis code, the order of operations is pretty important
  • in order to support streaming properly, we need to be able to choose which elements in an object get parsed first

I think we can solve this with the following approach:

  • allow either a genesis.json file as now, or a genesis/ folder
  • arrays which are large and intended to be streamed can like in their own JSON files, i.e. staking.delegations could live in genesis/staking/delegations.json
  • root.json in the genesis/ folder contains all definitions that are not in the separate files. Anything which is in a separate file should be omitted from root.json

The API for this might look something like:

type ObjectReader interface {
    // reads the named field from the object into the ptr
    ReadObject(name string, ptr interface{}) error
    // opens a stream reader for the array at the named field
    // this will open a separate file from the disk if it exists to support massive arrays
    // (i.e. `delegations.json` for the field `delegations`)
    StreamArray(name string) ArrayStreamer
}

type ArrayStreamer interface {
    ReadNext(ptr interface{}) error
    HaveMore() bool
}

Using gogo jsonpb, there is already an UnmarshalNext method we can leverage: https://godoc.org/github.com/gogo/protobuf/jsonpb#Unmarshaler.UnmarshalNext

@aaronc aaronc added the C:genesis relating to chain genesis label Aug 4, 2020
@aaronc aaronc added this to the v0.41 milestone Aug 4, 2020
@aaronc aaronc mentioned this issue Aug 4, 2020
16 tasks
@alexanderbez
Copy link
Contributor

I'm not sure how I feel about the proposal yet...I need to think on it more. But it just doesn't feel quite right with me. How are the files constructed? Is there a tool that breaks them up?

In general, I would rather opt for a more idiomatic approach where we can construct a clean API for retrieving app data, say by module name and fields within a module's app data by key, where the actual streaming happens via a JSON scanner either via stdlib (NewDecoder) or via our own implementation or using a third-party lib.

@aaronc
Copy link
Member Author

aaronc commented Aug 10, 2020

Okay if it were just a single file but with a similar API where items are retrieved via a path (i.e. gov/proposals) rather than GenesisState would that feel better @alexanderbez ?

I just want to note that the proposal of multiple files avoids needing to write custom JSON scanners. I couldn't figure out how to do that sort of random access with the current JSON APIs. The APIs that exist support streaming on the array level but not object level and so the whole JSON file would need to be loaded into memory if we don't split arrays out into separate files.

@alexanderbez
Copy link
Contributor

alexanderbez commented Aug 10, 2020

Have we looked into things like https://github.com/lloyd/goj or https://github.com/mailru/easyjson (which looks very promising). Otherwise, we might have to write something custom?

Edit: Also, https://github.com/buger/jsonparser which claims we can load only the keys we need.

@aaronc
Copy link
Member Author

aaronc commented Aug 10, 2020

I tried looking for streaming JSON parsers, the challenge is integrated with proto JSON. Maybe we can make it work. Just not quite sure how.

@aaronc
Copy link
Member Author

aaronc commented Apr 11, 2022

Replacing this with #11601 which has a concrete solution.

@aaronc aaronc closed this as completed Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:genesis relating to chain genesis
Projects
No open projects
Development

No branches or pull requests

3 participants