Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

availability-recovery: bump chunk fetch threshold to 1MB for Polkadot and 4MB for Kusama + testnets #4399

Merged
merged 4 commits into from
May 24, 2024

Conversation

sandreim
Copy link
Contributor

@sandreim sandreim commented May 7, 2024

Doing this change ensures that we minimize the CPU usage we spend in reed-solomon by only doing the re-encoding into chunks if PoV size is less than 4MB (which means all PoVs right now)

Based on susbystem benchmark results we concluded that it is safe to bump this number higher. At worst case scenario the network pressure for a backing group of 5 is around 25% of the network bandwidth in hw specs.

Assuming 6s block times (max_candidate_depth 3) and needed_approvals 30 the amount of bandwidth usage of a backing group used would hover above 30 * 4 * 3 = 360MB per relay chain block. Given a backing group of 5 that gives 72MB per block per validator -> 12 MB/s.

Reality check on Kusama PoV sizes (click for chart)
Screenshot 2024-05-07 at 14 30 38

Signed-off-by: Andrei Sandu <[email protected]>
@sandreim sandreim added R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”. labels May 7, 2024
@ordian
Copy link
Member

ordian commented May 8, 2024

Based on susbystem benchmark results we concluded that it is safe to bump this number higher. At worst case scenario the network pressure for a backing group of 5 is around 25% of the network bandwidth in hw specs.

Currently, the network up(load) requirements are quite low in this regard and making everyone download from backers in all the cases would change that quite a bit (+160MBit/s with backing group size of 3 assuming all 3 have the PoV and not just 2). I would also like to see this tested in the presence of disputes, when every paravalidator needs to download the PoV, not just 30.

To be on the safer side, would it be possible to gate this change to be Kusama only? Or start with a lower limit, e.g. 1MB.

@sandreim
Copy link
Contributor Author

sandreim commented May 8, 2024

Based on susbystem benchmark results we concluded that it is safe to bump this number higher. At worst case scenario the network pressure for a backing group of 5 is around 25% of the network bandwidth in hw specs.

Currently, the network up(load) requirements are quite low in this regard and making everyone download from backers

Is this asymmetric upload/download specs documented anywhere. The specs I am looking at say:

The minimum symmetric networking speed is set to 500 Mbit/s (= 62.5 MB/s). This is required to support a large number of parachains and allow for proper congestion control in busy network situations.

in all the cases would change that quite a bit (+160MBit/s with backing group size of 3 assuming all 3 have the PoV and not just 2). I would also like to see this tested in the presence of disputes, when every paravalidator needs to download the PoV, not just 30.

Yeah, with a backing group of 3, the load in worst case is 50% of this network bandwidth. However in the case of disputes the backers would be hammered, so the download of the PoV will fallback to chunks so we should be fine as this is an exceptional situation. We can definitely try this scenario with subsystem benchmarks, or are you suggesting we do a Versi test ?

To be on the safer side, would it be possible to gate this change to be Kusama only? Or start with a lower limit, e.g. 1MB.

This is doable per chain. I'd go for that rather than 1MB.

@burdges
Copy link

burdges commented May 8, 2024

However in the case of disputes the backers would be hammered

Yes, we're worried about variance here, since the averages stay the same. We'll presumably need the tit-for-tat game in availability rewards eventually, since you'd save lots by just not helping others check.

@sandreim
Copy link
Contributor Author

Switched to 1MB ok Polkadot and 4MB on Kusama + all testnets. @ordian PTAL.

@sandreim sandreim requested review from ordian and alindima May 24, 2024 10:07
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6281150

Signed-off-by: Andrei Sandu <[email protected]>
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable! Don't forget to update the PR description and title (will be used for commit message and they're outdated)

@sandreim sandreim changed the title availability-recovery: bump chunk fetch threshold to 4MB availability-recovery: bump chunk fetch threshold to 1MB for Polkadot and 4MB for Kusama + testnets May 24, 2024
@sandreim sandreim added this pull request to the merge queue May 24, 2024
Merged via the queue into master with commit f469fbf May 24, 2024
149 of 152 checks passed
@sandreim sandreim deleted the sandreim/bump_chunks_fetch_threshold branch May 24, 2024 14:40
hitchhooker pushed a commit to ibp-network/polkadot-sdk that referenced this pull request Jun 5, 2024
… and 4MB for Kusama + testnets (paritytech#4399)

Doing this change ensures that we minimize the CPU usage we spend in
reed-solomon by only doing the re-encoding into chunks if PoV size is
less than 4MB (which means all PoVs right now)
 
Based on susbystem benchmark results we concluded that it is safe to
bump this number higher. At worst case scenario the network pressure for
a backing group of 5 is around 25% of the network bandwidth in hw specs.

Assuming 6s block times (max_candidate_depth 3) and needed_approvals 30
the amount of bandwidth usage of a backing group used would hover above
`30 * 4 * 3 = 360MB` per relay chain block. Given a backing group of 5
that gives 72MB per block per validator -> 12 MB/s.

<details>
<summary>Reality check on Kusama PoV sizes (click for chart)</summary>
<br>
<img width="697" alt="Screenshot 2024-05-07 at 14 30 38"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/bfed32d4-8623-48b0-9ec0-8b95dd2a9d8c">
</details>

---------

Signed-off-by: Andrei Sandu <[email protected]>
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this pull request Aug 2, 2024
… and 4MB for Kusama + testnets (paritytech#4399)

Doing this change ensures that we minimize the CPU usage we spend in
reed-solomon by only doing the re-encoding into chunks if PoV size is
less than 4MB (which means all PoVs right now)
 
Based on susbystem benchmark results we concluded that it is safe to
bump this number higher. At worst case scenario the network pressure for
a backing group of 5 is around 25% of the network bandwidth in hw specs.

Assuming 6s block times (max_candidate_depth 3) and needed_approvals 30
the amount of bandwidth usage of a backing group used would hover above
`30 * 4 * 3 = 360MB` per relay chain block. Given a backing group of 5
that gives 72MB per block per validator -> 12 MB/s.

<details>
<summary>Reality check on Kusama PoV sizes (click for chart)</summary>
<br>
<img width="697" alt="Screenshot 2024-05-07 at 14 30 38"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/bfed32d4-8623-48b0-9ec0-8b95dd2a9d8c">
</details>

---------

Signed-off-by: Andrei Sandu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants