Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the guarantee that the iavl tree is inserted with ordered keys in each version #13837

Open
yihuang opened this issue Nov 11, 2022 · 11 comments
Assignees
Labels

Comments

@yihuang
Copy link
Collaborator

yihuang commented Nov 11, 2022

Summary

IAVL root hash depends on the insertion order.
When we think about some alternative format for chain state like #13317, the change set in each version will be always sorted.
And in practice, the IAVL store is always written at commit event by cache store of deliverState, the cache store always sort the keys before writing, this is good because it means we can recover the IAVL tree by replay the change sets recorded in the above format.

Problem Definition

The problem is can we make this guarantee and don't break it in the future, maybe change some interfaces as well.

Proposal

Make the guarantee that iavl tree is always written with keys in order for each version, and change some interfaces to reflect that, like remove the direct Set/Delete methods from uncached store, always go through a cache store which will sort the keys before writing. This is already the behavior in action, just want to set it in stone.

@alexanderbez
Copy link
Contributor

I'll mention this here again since it seems to be relevant, but I personally would love to explore and argue the case that the entire CacheKV store isn't necessary at all. And we can push this invariant requirement directly down into IAVL itself.

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 14, 2022

I'll mention this here again since it seems to be relevant, but I personally would love to explore and argue the case that the entire CacheKV store isn't necessary at all. And we can push this invariant requirement directly down into IAVL itself.

Do you mean we cache the writes inside iavl and delays the tree re-balancing until the commit time?

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 14, 2022

CacheKV serves another purpose to create a snapshot for a scope and discard the writes if the scope failed.

@alexanderbez
Copy link
Contributor

I would like to see IAVl essentially take care of this. I.e. cache/batch writes where nothing happens until Commit/Write is called.

@tac0turtle
Copy link
Member

@alexanderbez could you create a proposal doc on how you see it working. Heard you mention remove cachkv store before but it's hard to imagine where this logic moves without a doc or walkthrough of your proposal.

@alexanderbez
Copy link
Contributor

Yeah for sure! I have a rough idea in my head. It will involve breaking APIs, but I think it can be done. I'll draft up a proposal.

@peterbourgon
Copy link

peterbourgon commented Nov 16, 2022

+1 to @alexanderbez basically. IMO caching belongs at the KVStore layer rather than at the IAVL layer, but that's a detail I guess. The main point is that caching should be an implicit implementation detail, not a separate type or concept that needs to be explicitly known to callers. You want a single common interface interface e.g.

type KVStore interface {
    Set(key string, val []byte) error
    Get(key string) ([]byte, error)
    Del(key string) error
    Commit() error // maybe?
}

with a base implementation e.g.

type CoreKVStore struct {
    tree iavl.Tree // or whatever
}

func NewCoreKVStore(...) (*CoreKVStore, error) { ... }

and stuff like caching implemented as decorators e.g.

type CachingKVStore struct {
    base  KVStore
    cache map[string]whatever
}

func NewCachingKVStore(base KVStore, ...) (*CachingKVStore, error) { ... }

func (s *CachingKVStore) Set(key string, val []byte) error { 
    // set key=val in s.base
    // if that worked, set key=val in s.cache, with whatever invariants
}

func (s *CachingKVStore) Get(key string) ([]byte, error) { 
    // check cache first, with whatever invariants
    // if not cached, call s.base.Get and cache response
}

// etc.

so that code that uses a KVStore can just take a KVStore directly e.g.

func NewApp(kvs KVStore, ...) (*App, error) { ... }

and can safely assume that caching, or any other value-add behavior, is already satisfied by the value given to them.

@alexanderbez
Copy link
Contributor

This is more or less how it works now. A caller never explicitly asks for a cached KVStore, they do however, ask to cache-wrap. I haven't thought about this in great detail, so I could totally be off the mark here. But the current implementation is a hoge-podge of random responsbilities and concerns that it warrants a serious consideration of it's intended goals and features.

@peterbourgon
Copy link

peterbourgon commented Nov 16, 2022

This is more or less how it works now. A caller never explicitly asks for a cached KVStore, they do however, ask to cache-wrap.

That's true at a high level, but the devil is in the details. The problem is that there isn't a single canonical KVStore interface which is widely satisfied, instead there is a hierarchy of

  • Store which is a union of CacheWrapper + a method returning a type, probably shouldn't exist
  • BasicKVStore which unfortunately doesn't return errors and AFAICT isn't meaningfully used and should be deleted
  • KVStore as a union of Store + BasicKVStore + iterator methods which should probably be the only defined interface
  • CacheWrapper which defines self-referential methods and so is basically unusable and should probably be removed
  • Committer which might makes sense as a distinct interface but probably shouldn't be optional
  • CommitStore as a union of interfaces that likely make no sense to use separately (and so probably should be removed)
  • CacheMultiStore which defines a Write method that should likely be implicit in any Commit method (and so rm'd)
  • CommitMultiStore as a union of many arbitrary concerns that probably shouldn't exist as an interface at all
  • many others

These interfaces are used basically arbitrarily, and callers can't make any simplifying assumptions about received values, in practice you have to do type assertions and interface upgrades to use the package effectively. Which is exactly as you say:

I haven't thought about this in great detail, so I could totally be off the mark here. But the current implementation is a hoge-podge of random responsbilities and concerns that it warrants a serious consideration of it's intended goals and features.

tl;dr: definitely can and should be improved

@tac0turtle
Copy link
Member

could we open a new issue that encapsulates the dsicssion on store design?in the meantime could we look into this issue to see the scope of work needed, if its small, then lets solve this is.

cc @facundomedica if you want to pick it up

@kocubinski
Copy link
Member

If IAVL is to support replay, it must record the change sets it applied per version in the order that it did. Whether those keys are lexically sorted or not does not seem relevant to me. It must be a feature of IAVL, not defined by it's consumer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants