Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

add erasure-coding benches #6308

Merged
merged 5 commits into from
Jan 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 124 additions & 11 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions erasure-coding/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,10 @@ parity-scale-codec = { version = "3.1.5", default-features = false, features = [
sp-core = { git = "https://github.com/paritytech/substrate", branch = "master" }
sp-trie = { git = "https://github.com/paritytech/substrate", branch = "master" }
thiserror = "1.0.31"

[dev-dependencies]
criterion = { version = "0.4.0", default-features = false, features = ["cargo_bench_support"] }

[[bench]]
name = "scaling_with_validators"
harness = false
39 changes: 39 additions & 0 deletions erasure-coding/benches/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
### Run benches
```
$ cd erasure-coding # ensure you are in the right directory
$ cargo bench
```

### `scaling_with_validators`

This benchmark evaluates the performance of constructing the chunks and the erasure root from PoV and
reconstructing the PoV from chunks. You can see the results of running this bench on 5950x below.
Interestingly, with `10_000` chunks (validators) its slower than with `50_000` for both construction
and reconstruction.
```
construct/200 time: [93.924 ms 94.525 ms 95.214 ms]
thrpt: [52.513 MiB/s 52.896 MiB/s 53.234 MiB/s]
construct/500 time: [111.25 ms 111.52 ms 111.80 ms]
thrpt: [44.721 MiB/s 44.837 MiB/s 44.946 MiB/s]
construct/1000 time: [117.37 ms 118.28 ms 119.21 ms]
thrpt: [41.941 MiB/s 42.273 MiB/s 42.601 MiB/s]
construct/2000 time: [125.05 ms 125.72 ms 126.38 ms]
thrpt: [39.564 MiB/s 39.772 MiB/s 39.983 MiB/s]
construct/10000 time: [270.46 ms 275.11 ms 279.81 ms]
thrpt: [17.869 MiB/s 18.174 MiB/s 18.487 MiB/s]
Copy link
Contributor

@sandreim sandreim Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite interesting. Would be even more to see at which number of validators the tput reverses trend ?

Copy link
Contributor

@burdges burdges Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AFFT scales like blocksize * log(validators) but with the log being discrete, so it'll never reverse trend. It's arithmetic cannot handle more than 16k validators, so we're missing asserts that should kill it long before 50k. I'd expect 50k = 2k here, at least for cache pressure, but there is some counter running up the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll collapse from all those validators^2 gossip messages long before this poses any issues. We just need enough validators per relay chain to reach 2nd layer scaling, which is only 600 by Bryan Ford's estimates in OmniLedger, but is more like 1000 by Alfonso's estimates. We'd go to 3/4 of a power of 2 to optimize the erasure coding if we can really make 769 work or whatever, but maybe 1500 is a bit large for the gossip.

Copy link
Member Author

@ordian ordian Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's arithmetic cannot handle more than 16k validators, so we're missing asserts that should kill it long before 50k.

Are you sure it's not 2^16, which is 64k? I added an assertion in reconstruct that it decodes to the original PoV and it passes with 50k validators, but panics with TooManyValidators error with 70k shards.

// we are limited to the field order of GF(2^16), which is 65536

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll collapse from all those validators^2 gossip messages long before this poses any issues.

I agree, at this point (10k+ validators) networking would be the bottleneck, not CPU cost of erasure-coding.

Copy link
Contributor

@burdges burdges Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yes you're right, 2^16, not sure why 50k goes faster then. lol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a log factor so I guess some unrelated artifact of the benchmark is just overriding this somehow. the erasure coding cannot actually get faster with more validators

construct/50000 time: [205.86 ms 209.66 ms 213.64 ms]
thrpt: [23.404 MiB/s 23.848 MiB/s 24.288 MiB/s]

reconstruct/200 time: [180.73 ms 184.09 ms 187.73 ms]
thrpt: [26.634 MiB/s 27.160 MiB/s 27.666 MiB/s]
reconstruct/500 time: [195.59 ms 198.58 ms 201.76 ms]
thrpt: [24.781 MiB/s 25.179 MiB/s 25.564 MiB/s]
reconstruct/1000 time: [207.92 ms 211.57 ms 215.57 ms]
thrpt: [23.195 MiB/s 23.633 MiB/s 24.048 MiB/s]
reconstruct/2000 time: [218.59 ms 223.68 ms 229.18 ms]
thrpt: [21.817 MiB/s 22.354 MiB/s 22.874 MiB/s]
reconstruct/10000 time: [496.35 ms 505.17 ms 515.42 ms]
thrpt: [9.7008 MiB/s 9.8977 MiB/s 10.074 MiB/s]
reconstruct/50000 time: [276.56 ms 277.53 ms 278.58 ms]
thrpt: [17.948 MiB/s 18.016 MiB/s 18.079 MiB/s]
```
Loading