-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store plain historical states to reduce the db size of archive node #13317
Comments
Big fan of changing how we do writing and handling of historical state. Right now we are writing a specification and better testing for the store package to better understand the semantics and then we plan on making changes. There are a ton of changes we can do to bring huge improvements in performance. |
If there are some nesserary hooks in SDK, it might be possible to develop this feature outside of SDK:
|
this is adr-038, no? |
yeah, didn't know that, looks like exactly what we need. |
it'd be great if adr-038 could be backported to as far as v0.44 though, so we can stream the chain since the genesis. |
|
Implementation Plan// VersionStore is a versioned storage of a flat key-value pairs.
// it don't need to support merkle proof, so could be implemented in a much more efficient way.
// `nil` version means the latest version.
type VersionStore interface {
GetAtVersion(storeKey string, key []byte, version *int64) ([]byte, error)
HasAtVersion(storeKey string, key []byte, version *int64) (bool, error)
IteratorAtVersion(storeKey string, start, end []byte, version *int64) types.Iterator
ReverseIteratorAtVersion(storeKey string, start, end []byte, version *int64) types.Iterator
GetLatestVersion() int64
// Persist the change set of a block,
// the `changeSet` should be ordered by (storeKey, key),
// the version should be latest version plus one.
PutAtVersion(version int64, changeSet []types.StoreKVPair) error
} Integrate
|
in this approach are you assuming iavl does not handle versions anymore? I think if we fix iavl ky format queries will improve dramatically |
Sounds like you're describing a dedicated SS layer. I'm definitely in favor of splitting SS + SC responsibilities from the IAVL! I would really really really love for all this to be consolidated into a single EPIC. Which we have here -> #12986 |
yes, only keep the recent ones necessary for the consensus state machine,
We are mainly focus on db size right now, in local test, the new db format is 10x smaller than the iavl db. |
The difference with the SS described in v2 store is it also support versioning, so it can support an archived grpc query service.
happy to move the discussion there. |
personally the design of v2 was poor. the idea that proof support for historical versions means you need to reconstruct the tree is a bad design. Proof support while not widely used today is at the core of blockchain and decentralising data access. Secondly, after talking to dev from osmosis, there are designs where proof support will want to exist for previous heights. We should allow an alternative for chains that don't care, but I think we should not write it off entirely |
Agree, that's why I try to keep the implementation outside of sdk for now. But will need to change the API a little bit to hook the db with the grpc query. |
@yihuang do you have a concrete proposal? I do agree that SS should not concern itself with versioning -- it has no need to. Seems like we're on the same page. |
We have a prototype implementation already crypto-org-chain/cronos#722 . |
Amazing. Do you have an ADR, spec or doc detailing the idea? |
|
I see. We can consider and evaluate that approach as we refactor the store. I'm not sure we'll go exactly with that approach though. |
@yihuang so I am going to write up an ADR this week for a formal design proposal of concrete changes to state management based primarily on the work and approach designed here. Am I correct in understanding that merkle proofs on historical data is not supported in the approach you've outlined? What if that is required or a desired property, e.g. relayers or IBC clients. What is your proposal or ideas around proof support for historical state? |
yes, it don't support proof generation, for that case, we can probably keep an archive iavl tree with more compression for that special case, for example, omitting the |
@alexanderbez recently I'm also thinking about a design choice regarding what value to store in changeset, there are two options:
|
On first glance, I think I prefer the current approach. Having the latest state + some subset (or all) of historical state seems like a pretty normal expectation to have. |
Another small issue compared with erigon is, we can't do chunking of the bitmap index, in erigon the chunking is done by append chunk id to the key, but since our keys have variable length, we can't do that, otherwise, it'll interleave with other keys. |
So I don't know what any of this is, so when I write the ADR, it would be helpful if you could link me to all relevant specification and documentation on the historicalDB approach (Sorry in advance if you've already linked it :p ) |
The most complete description is still this doc1, I just made some complements, describing the bitmap chunking issue, and more advantages of the alternative design to store the after value. Footnotes |
I'm trying the rocksdb user-defined timestamp1 to implement versiondb, it seems pretty good. Footnotes |
Something we'll discuss in the storage WG! |
closing this because of version db and the work that will commence with store v2 |
Plain Historical State
Context
The archive node's db size increase very fast, akthough pruning is good, but many users still need to access the complete chain history, to keep it sustainable, we need a way to reduce the disk space requirement of archive nodes.
One observation is that most of the time, we don't need to generate Merkle proofs for historical states, so we can prune the iavl tree aggressively and keep the historical states in a compact format.
Proposal
At the end of the block, before committing the iavl tree, also save the change sets in another archive storage, the storage format is explained below.
Support querying from the archive storage transparently.
Nodes can use much more aggressive pruning settings for iavl tree, and still be able to query full chain history.
The total db size is expected to be much lower than an archived iavl tree.
Write Amplification
The write amplification is increased, but it should be ok if writing speed is not the bottleneck for the workload (need to verify), after all, we only do a batch of writes per several seconds.
Archive Format
Option 1
Store in kv db (inspired by erigon 1).
History Index,
key -> set of block numbers
The set of block numbers that modified the key, is encoded as a bitmap.
Change Sets,
(block number, key) -> value
Record the prior value for each modification.
Option 2
Store the history index in kv db.
Store the change sets in plain files in an append-only way, together with some index files to record the offsets of blocks.
When querying, first locate the block number through the history index, then locate the index into the change set file, then find the key in the change set of the block with linear search.
If the change set of a block is big, it could be encoded as a mmap-able b-tree, for more efficient querying.
Footnotes
https://github.com/ledgerwatch/erigon/blob/devel/docs/programmers_guide/db_walkthrough.MD#table-history-of-accounts ↩
The text was updated successfully, but these errors were encountered: