store benchmarks #11444

robert-zaremba · 2022-03-23T23:45:09Z

Summary

We need a set of benchmark to correctly measure performance and detect inefficiencies in store/v2 design and implementation.
Meta issue to collect tasks for benchmarks we want to code.

Proposal

General approach

to make a fair test, we should use the DB which will be used in "production"
IAVL has a caching layer, but SMT doesn't have. So it might be worth to go around it by either disabling the caching layer or using hashdb (this way all DB lookups are O(1), memdb uses in memory btree, so lookups is O(log n))

`Get` oriented benchmarks

in the setup we should store millions of records. We should deterministically query the store with keys which we expect there and keys we don't expect. We should have following setups

Tree based (so bypassing the multistore) and a full setups (including multistore)
using baddger and using hashdb
try to disable the IAVL cache
keys should have ~ 32 bytes - this is because SMT creats a key hash for every operations and uses a hash to reference an object.

Expectation:

Get oriented scenario should be much faster in store/v2 because we don't need to go through SMT and direct DB query should be done.
using hashdb should give the best results.

`Set` oriented benchmark

Create a deterministic scenario with only Set operations.
Set will do 2 operations is store/v2: saving in SMT (will require traversing the tree an doing path update) and one direct update for State Store. IAVL, is doing tree rebalance and Merkle Path update - so similar amount of operations to store/v2.

DB transactions

Simulate block changes: Commit changes every 20k operations

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate contributors tagged
Contributor assigned/self-assigned

The text was updated successfully, but these errors were encountered:

faddat · 2022-03-25T18:55:27Z

@robert-zaremba instead of doing benchmarks without the iavl cache, we should benchmark them against each other directly.

Otherwise, we hobble iavl performance and the comparison is invalid

robert-zaremba · 2022-03-25T21:03:23Z

@faddat , the goal is to test without a cache. store/v2 doesn't have a cache layer yet, so we would like to see a raw performance comparison of store/v1 vs store/v2 without a cache.

adu-web3 · 2022-04-04T08:56:28Z

@robert-zaremba @roysc Can we add benchmarks for database size?
this is another major constraint factor of chain scalability that we mostly care about.
I can provide help if you need.
btw, is adr-040 designed to help reduce database size?

i-norden · 2022-04-08T15:38:55Z

@roysc we may want to move #11328 (comment) here to keep things in one place.

roysc · 2022-04-20T08:23:06Z

To simplify things I will summarize here the results from the same data used in #11328 (comment), which contains analysis.

Each case is run on a store pre-filled with 100,000 entries. Keys are all 32 random bytes (generated as SHA256 sum of varint-encoded sequential integers):

get/set-present: time to read/update 1000 keys that are present
get/set-absent: time to try to read 1000 keys that are not present / insert new keys
getpast: time to read 1000 keys from a past version (latest - 1)
memdb is used for all cases to be consistent, since it is the fastest DB implementation
each benchstat shows comparison of v1 (IAVL) with v2 (SMT) store

n indicates the store is tested directly after being pre-filled. In IAVL's case this means the full tree is cached.

$ benchstat v1_n_100kv.results v2_n_100kv.results 
name                            old time/op  new time/op    delta
KVStore/memdb/get-present-4     1.52ms ±12%    2.75ms ± 8%    +81.36%  (p=0.000 n=18+18)
KVStore/memdb/get-absent-4      1.65ms ± 4%    4.15ms ±67%   +151.91%  (p=0.000 n=16+20)
KVStore/memdb/set-present-4     3.41ms ± 3%  174.08ms ±43%  +5001.40%  (p=0.000 n=18+20)
KVStore/memdb/set-absent-4      4.54ms ±24%  169.74ms ±14%  +3638.43%  (p=0.000 n=18+20)
MultiStore/memdb-getpast-4      14.6ms ± 6%     5.2ms ±63%    -64.44%  (p=0.000 n=17+19)
MultiStore/memdb/get-present-4  1.55ms ± 3%    2.66ms ± 6%    +71.80%  (p=0.000 n=18+18)
MultiStore/memdb/get-absent-4   1.73ms ± 4%    2.79ms ±19%    +61.17%  (p=0.000 n=18+20)
MultiStore/memdb/set-present-4  3.69ms ±12%  201.79ms ±21%  +5362.11%  (p=0.000 n=17+20)
MultiStore/memdb/set-absent-4   4.56ms ± 7%  203.86ms ±12%  +4372.38%  (p=0.000 n=18+20)

r indicates the store is pre-filled, then reloaded so the cache starts empty.

$ benchstat v1_r_100kv.results v2_r_100kv.results 
name                            old time/op  new time/op    delta
KVStore/memdb/get-present-4     37.9ms ± 8%     2.7ms ±12%    -92.81%  (p=0.000 n=20+18)
KVStore/memdb/get-absent-4      37.6ms ± 7%     3.0ms ± 7%    -92.03%  (p=0.000 n=19+18)
KVStore/memdb/set-present-4     11.9ms ±46%   163.2ms ±17%  +1273.73%  (p=0.000 n=20+19)
KVStore/memdb/set-absent-4      8.44ms ±27%  169.65ms ±15%  +1910.00%  (p=0.000 n=20+20)
MultiStore/memdb-getpast-4      16.0ms ±36%     4.3ms ±46%    -73.35%  (p=0.000 n=17+20)
MultiStore/memdb/get-present-4  12.8ms ±46%     2.6ms ±10%    -79.79%  (p=0.000 n=19+19)
MultiStore/memdb/get-absent-4   6.73ms ±33%    2.72ms ±24%    -59.57%  (p=0.000 n=20+19)
MultiStore/memdb/set-present-4  7.13ms ±24%  202.49ms ±19%  +2741.86%  (p=0.000 n=18+20)
MultiStore/memdb/set-absent-4   6.34ms ±25%  203.51ms ±16%  +3108.22%  (p=0.000 n=20+20)

Flame chart for v1, showing both Get and Set consist of recursive calls to access the tree:

And for v2, showing the majority of time is spent accessing the DB:

roysc · 2022-04-20T09:41:34Z

We have a new SMT implementation in progress at vulcanize/smt#5. Preliminary benchmark results show a significant improvement.

Like the IAVL tree, this SMT only writes to the DB on commit, and caches the current state as an actual node tree, hence inserts are much faster. Reads are faster mainly because value storage has been moved out of the SMT object, allowing us to index by raw key (rather than hashed).

v1 vs. new v2:

$ benchstat v1_n_100kv.results v2_n_100kv.results 
name                            old time/op  new time/op   delta
KVStore/memdb/get-present-4     1.83ms ± 4%   2.49ms ± 2%   +36.21%  (p=0.008 n=5+5)
KVStore/memdb/get-absent-4      1.97ms ± 4%   2.48ms ± 1%   +25.93%  (p=0.008 n=5+5)
KVStore/memdb/set-present-4     4.68ms ± 2%  14.23ms ± 7%  +203.92%  (p=0.008 n=5+5)
KVStore/memdb/set-absent-4      5.59ms ± 7%  17.72ms ±15%  +217.22%  (p=0.008 n=5+5)
MultiStore/memdb/getpast-4      10.8ms ±70%    3.0ms ± 3%   -71.87%  (p=0.008 n=5+5)
MultiStore/memdb/get-present-4  1.75ms ± 7%   2.41ms ± 1%   +37.79%  (p=0.008 n=5+5)
MultiStore/memdb/get-absent-4   2.30ms ±18%   2.47ms ± 2%      ~     (p=0.421 n=5+5)
MultiStore/memdb/set-present-4  6.54ms ±32%  15.24ms ± 2%  +132.95%  (p=0.008 n=5+5)
MultiStore/memdb/set-absent-4   6.98ms ±50%  18.05ms ±12%  +158.72%  (p=0.008 n=5+5)

Old v2 vs new v2:

$ benchstat ../2022-04-04/v2_n_100kv.results v2_n_100kv.results 
name                            old time/op  new time/op  delta
KVStore/memdb/get-present-4     2.75ms ± 8%  2.49ms ± 2%   -9.52%  (p=0.000 n=18+5)
KVStore/memdb/get-absent-4      4.15ms ±67%  2.48ms ± 1%  -40.25%  (p=0.000 n=20+5)
KVStore/memdb/set-present-4      174ms ±43%    14ms ± 7%  -91.82%  (p=0.000 n=20+5)
KVStore/memdb/set-absent-4       170ms ±14%    18ms ±15%  -89.56%  (p=0.000 n=20+5)
MultiStore/memdb/get-present-4  2.66ms ± 6%  2.41ms ± 1%   -9.38%  (p=0.000 n=18+5)
MultiStore/memdb/get-absent-4   2.79ms ±19%  2.47ms ± 2%  -11.29%  (p=0.000 n=20+5)
MultiStore/memdb/set-present-4   202ms ±21%    15ms ± 2%  -92.45%  (p=0.000 n=20+5)
MultiStore/memdb/set-absent-4    204ms ±12%    18ms ±12%  -91.14%  (p=0.000 n=20+5)

musalbas · 2022-04-28T15:44:16Z

Out of curiosity does this new implementation produce the same trees as the old smt (e.g. including all subtrees with the same empty value are replaced by a placeholder)?

roysc · 2022-04-28T16:11:49Z

@musalbas yes, identical hashes

musalbas · 2022-04-28T16:16:55Z

Nice. Regarding the DeepSubTree feature, I assume you removed that for the time being as it required extra work in order to make it compatible with the new lazy implementation, and this wasn't used in cosmos-sdk? This is needed to enable state transition fraud proofs for Cosmos SDK rollups that we're implementing. If there's no inherent reason why it wouldn't be possible to re-implement that with the new lazy implementation, then we (Celestia) would be interested in pursuing that, if we upstream the efficiency improvements in the lazy implementation to our fork.

roysc · 2022-04-29T06:35:29Z

Right, it just wasn't relevant for this scope of work.

tac0turtle · 2023-09-01T14:48:47Z

closing this as its being handled though other issues

ref #17580

robert-zaremba added the C:Store label Mar 23, 2022

robert-zaremba assigned roysc Mar 25, 2022

roysc mentioned this issue Mar 26, 2022

Analyze and improve store/v2 performance #11328

Closed

7 tasks

i-norden added the S: Vulcanize label Jun 2, 2022

adu-crypto mentioned this issue Jun 15, 2022

Problem: db size increase too fast crypto-org-chain/cronos#451

Closed

roysc mentioned this issue Sep 29, 2022

Use lazy-loaded cached tree implementation celestiaorg/smt#73

Closed

tac0turtle unassigned roysc Oct 21, 2022

i-norden mentioned this issue Feb 1, 2023

ADR-040 performance issue leads vulcanize/cosmos-sdk#21

Closed

9 tasks

tac0turtle removed the S: Vulcanize label Apr 3, 2023

h5law mentioned this issue May 23, 2023

Use lazy-loaded cached tree implementation pokt-network/smt#6

Merged

tac0turtle closed this as completed Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store benchmarks #11444

store benchmarks #11444

robert-zaremba commented Mar 23, 2022 •

edited

Loading

faddat commented Mar 25, 2022

robert-zaremba commented Mar 25, 2022

adu-web3 commented Apr 4, 2022

i-norden commented Apr 8, 2022

roysc commented Apr 20, 2022

roysc commented Apr 20, 2022

musalbas commented Apr 28, 2022 •

edited

Loading

roysc commented Apr 28, 2022

musalbas commented Apr 28, 2022 •

edited

Loading

roysc commented Apr 29, 2022 •

edited

Loading

tac0turtle commented Sep 1, 2023

store benchmarks #11444

store benchmarks #11444

Comments

robert-zaremba commented Mar 23, 2022 • edited Loading

Summary

Proposal

Get oriented benchmarks

Set oriented benchmark

DB transactions

For Admin Use

faddat commented Mar 25, 2022

robert-zaremba commented Mar 25, 2022

adu-web3 commented Apr 4, 2022

i-norden commented Apr 8, 2022

roysc commented Apr 20, 2022

roysc commented Apr 20, 2022

musalbas commented Apr 28, 2022 • edited Loading

roysc commented Apr 28, 2022

musalbas commented Apr 28, 2022 • edited Loading

roysc commented Apr 29, 2022 • edited Loading

tac0turtle commented Sep 1, 2023

robert-zaremba commented Mar 23, 2022 •

edited

Loading

`Get` oriented benchmarks

`Set` oriented benchmark

musalbas commented Apr 28, 2022 •

edited

Loading

musalbas commented Apr 28, 2022 •

edited

Loading

roysc commented Apr 29, 2022 •

edited

Loading