Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store benchmarks #11444

Closed
5 tasks
robert-zaremba opened this issue Mar 23, 2022 · 11 comments
Closed
5 tasks

store benchmarks #11444

robert-zaremba opened this issue Mar 23, 2022 · 11 comments
Labels

Comments

@robert-zaremba
Copy link
Collaborator

robert-zaremba commented Mar 23, 2022

Summary

We need a set of benchmark to correctly measure performance and detect inefficiencies in store/v2 design and implementation.
Meta issue to collect tasks for benchmarks we want to code.

Proposal

General approach

  • to make a fair test, we should use the DB which will be used in "production"
  • IAVL has a caching layer, but SMT doesn't have. So it might be worth to go around it by either disabling the caching layer or using hashdb (this way all DB lookups are O(1), memdb uses in memory btree, so lookups is O(log n))

Get oriented benchmarks

in the setup we should store millions of records. We should deterministically query the store with keys which we expect there and keys we don't expect. We should have following setups

  • Tree based (so bypassing the multistore) and a full setups (including multistore)
  • using baddger and using hashdb
  • try to disable the IAVL cache
  • keys should have ~ 32 bytes - this is because SMT creats a key hash for every operations and uses a hash to reference an object.

Expectation:

  • Get oriented scenario should be much faster in store/v2 because we don't need to go through SMT and direct DB query should be done.
  • using hashdb should give the best results.

Set oriented benchmark

Create a deterministic scenario with only Set operations.
Set will do 2 operations is store/v2: saving in SMT (will require traversing the tree an doing path update) and one direct update for State Store. IAVL, is doing tree rebalance and Merkle Path update - so similar amount of operations to store/v2.

DB transactions

  • Simulate block changes: Commit changes every 20k operations

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@faddat
Copy link
Contributor

faddat commented Mar 25, 2022

@robert-zaremba instead of doing benchmarks without the iavl cache, we should benchmark them against each other directly.

Otherwise, we hobble iavl performance and the comparison is invalid

@robert-zaremba
Copy link
Collaborator Author

@faddat , the goal is to test without a cache. store/v2 doesn't have a cache layer yet, so we would like to see a raw performance comparison of store/v1 vs store/v2 without a cache.

@adu-web3
Copy link
Contributor

adu-web3 commented Apr 4, 2022

@robert-zaremba @roysc Can we add benchmarks for database size?
this is another major constraint factor of chain scalability that we mostly care about.
I can provide help if you need.
btw, is adr-040 designed to help reduce database size?

@i-norden
Copy link
Contributor

i-norden commented Apr 8, 2022

@roysc we may want to move #11328 (comment) here to keep things in one place.

@roysc
Copy link
Contributor

roysc commented Apr 20, 2022

To simplify things I will summarize here the results from the same data used in #11328 (comment), which contains analysis.

Each case is run on a store pre-filled with 100,000 entries. Keys are all 32 random bytes (generated as SHA256 sum of varint-encoded sequential integers):

  • get/set-present: time to read/update 1000 keys that are present
  • get/set-absent: time to try to read 1000 keys that are not present / insert new keys
  • getpast: time to read 1000 keys from a past version (latest - 1)
  • memdb is used for all cases to be consistent, since it is the fastest DB implementation
  • each benchstat shows comparison of v1 (IAVL) with v2 (SMT) store

n indicates the store is tested directly after being pre-filled. In IAVL's case this means the full tree is cached.

$ benchstat v1_n_100kv.results v2_n_100kv.results 
name                            old time/op  new time/op    delta
KVStore/memdb/get-present-4     1.52ms ±12%    2.75ms ± 8%    +81.36%  (p=0.000 n=18+18)
KVStore/memdb/get-absent-4      1.65ms ± 4%    4.15ms ±67%   +151.91%  (p=0.000 n=16+20)
KVStore/memdb/set-present-4     3.41ms ± 3%  174.08ms ±43%  +5001.40%  (p=0.000 n=18+20)
KVStore/memdb/set-absent-4      4.54ms ±24%  169.74ms ±14%  +3638.43%  (p=0.000 n=18+20)
MultiStore/memdb-getpast-4      14.6ms ± 6%     5.2ms ±63%    -64.44%  (p=0.000 n=17+19)
MultiStore/memdb/get-present-4  1.55ms ± 3%    2.66ms ± 6%    +71.80%  (p=0.000 n=18+18)
MultiStore/memdb/get-absent-4   1.73ms ± 4%    2.79ms ±19%    +61.17%  (p=0.000 n=18+20)
MultiStore/memdb/set-present-4  3.69ms ±12%  201.79ms ±21%  +5362.11%  (p=0.000 n=17+20)
MultiStore/memdb/set-absent-4   4.56ms ± 7%  203.86ms ±12%  +4372.38%  (p=0.000 n=18+20)

r indicates the store is pre-filled, then reloaded so the cache starts empty.

$ benchstat v1_r_100kv.results v2_r_100kv.results 
name                            old time/op  new time/op    delta
KVStore/memdb/get-present-4     37.9ms ± 8%     2.7ms ±12%    -92.81%  (p=0.000 n=20+18)
KVStore/memdb/get-absent-4      37.6ms ± 7%     3.0ms ± 7%    -92.03%  (p=0.000 n=19+18)
KVStore/memdb/set-present-4     11.9ms ±46%   163.2ms ±17%  +1273.73%  (p=0.000 n=20+19)
KVStore/memdb/set-absent-4      8.44ms ±27%  169.65ms ±15%  +1910.00%  (p=0.000 n=20+20)
MultiStore/memdb-getpast-4      16.0ms ±36%     4.3ms ±46%    -73.35%  (p=0.000 n=17+20)
MultiStore/memdb/get-present-4  12.8ms ±46%     2.6ms ±10%    -79.79%  (p=0.000 n=19+19)
MultiStore/memdb/get-absent-4   6.73ms ±33%    2.72ms ±24%    -59.57%  (p=0.000 n=20+19)
MultiStore/memdb/set-present-4  7.13ms ±24%  202.49ms ±19%  +2741.86%  (p=0.000 n=18+20)
MultiStore/memdb/set-absent-4   6.34ms ±25%  203.51ms ±16%  +3108.22%  (p=0.000 n=20+20)

Flame chart for v1, showing both Get and Set consist of recursive calls to access the tree:
v1_kv_prof

And for v2, showing the majority of time is spent accessing the DB:
v2_kv_prof

@roysc
Copy link
Contributor

roysc commented Apr 20, 2022

We have a new SMT implementation in progress at vulcanize/smt#5. Preliminary benchmark results show a significant improvement.

Like the IAVL tree, this SMT only writes to the DB on commit, and caches the current state as an actual node tree, hence inserts are much faster. Reads are faster mainly because value storage has been moved out of the SMT object, allowing us to index by raw key (rather than hashed).

v1 vs. new v2:

$ benchstat v1_n_100kv.results v2_n_100kv.results 
name                            old time/op  new time/op   delta
KVStore/memdb/get-present-4     1.83ms ± 4%   2.49ms ± 2%   +36.21%  (p=0.008 n=5+5)
KVStore/memdb/get-absent-4      1.97ms ± 4%   2.48ms ± 1%   +25.93%  (p=0.008 n=5+5)
KVStore/memdb/set-present-4     4.68ms ± 2%  14.23ms ± 7%  +203.92%  (p=0.008 n=5+5)
KVStore/memdb/set-absent-4      5.59ms ± 7%  17.72ms ±15%  +217.22%  (p=0.008 n=5+5)
MultiStore/memdb/getpast-4      10.8ms ±70%    3.0ms ± 3%   -71.87%  (p=0.008 n=5+5)
MultiStore/memdb/get-present-4  1.75ms ± 7%   2.41ms ± 1%   +37.79%  (p=0.008 n=5+5)
MultiStore/memdb/get-absent-4   2.30ms ±18%   2.47ms ± 2%      ~     (p=0.421 n=5+5)
MultiStore/memdb/set-present-4  6.54ms ±32%  15.24ms ± 2%  +132.95%  (p=0.008 n=5+5)
MultiStore/memdb/set-absent-4   6.98ms ±50%  18.05ms ±12%  +158.72%  (p=0.008 n=5+5)

Old v2 vs new v2:

$ benchstat ../2022-04-04/v2_n_100kv.results v2_n_100kv.results 
name                            old time/op  new time/op  delta
KVStore/memdb/get-present-4     2.75ms ± 8%  2.49ms ± 2%   -9.52%  (p=0.000 n=18+5)
KVStore/memdb/get-absent-4      4.15ms ±67%  2.48ms ± 1%  -40.25%  (p=0.000 n=20+5)
KVStore/memdb/set-present-4      174ms ±43%    14ms ± 7%  -91.82%  (p=0.000 n=20+5)
KVStore/memdb/set-absent-4       170ms ±14%    18ms ±15%  -89.56%  (p=0.000 n=20+5)
MultiStore/memdb/get-present-4  2.66ms ± 6%  2.41ms ± 1%   -9.38%  (p=0.000 n=18+5)
MultiStore/memdb/get-absent-4   2.79ms ±19%  2.47ms ± 2%  -11.29%  (p=0.000 n=20+5)
MultiStore/memdb/set-present-4   202ms ±21%    15ms ± 2%  -92.45%  (p=0.000 n=20+5)
MultiStore/memdb/set-absent-4    204ms ±12%    18ms ±12%  -91.14%  (p=0.000 n=20+5)

@musalbas
Copy link

musalbas commented Apr 28, 2022

Out of curiosity does this new implementation produce the same trees as the old smt (e.g. including all subtrees with the same empty value are replaced by a placeholder)?

@roysc
Copy link
Contributor

roysc commented Apr 28, 2022

@musalbas yes, identical hashes

@musalbas
Copy link

musalbas commented Apr 28, 2022

Nice. Regarding the DeepSubTree feature, I assume you removed that for the time being as it required extra work in order to make it compatible with the new lazy implementation, and this wasn't used in cosmos-sdk? This is needed to enable state transition fraud proofs for Cosmos SDK rollups that we're implementing. If there's no inherent reason why it wouldn't be possible to re-implement that with the new lazy implementation, then we (Celestia) would be interested in pursuing that, if we upstream the efficiency improvements in the lazy implementation to our fork.

@roysc
Copy link
Contributor

roysc commented Apr 29, 2022

Right, it just wasn't relevant for this scope of work.

@tac0turtle
Copy link
Member

closing this as its being handled though other issues

ref #17580

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants