Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic scaling: runtime dependency tracking and enactment #3479

Merged
merged 53 commits into from
Mar 21, 2024

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Feb 26, 2024

Changes needed to implement the runtime part of elastic scaling: #3131, #3132, #3202

Also fixes #3675

TODOs:

Relies on the changes made in #3233 in terms of the inclusion policy and the candidate ordering

@alindima alindima marked this pull request as draft February 26, 2024 10:44
@alindima alindima added T8-polkadot This PR/Issue is related to/affects the Polkadot network. I5-enhancement An additional feature request. labels Feb 26, 2024
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good at first pass, but readability can certainly be improved.

I think we should also see if we can adjust apply_weight_limit candidate selection to account for elastic scaling.

polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
@sandreim
Copy link
Contributor

sandreim commented Mar 5, 2024

candidate_pending_availability runtime API is needed by collators. At first glance we might need to return all of them. or maybe just the last one.

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, left a couple of questions. Happy to approve once this is tested/burned-in.

Comment on lines -645 to -650
// In `Enter` context (invoked during execution) there should be no backing votes from
// disabled validators because they should have been filtered out during inherent data
// preparation (`ProvideInherent` context). Abort in such cases.
if context == ProcessInherentDataContext::Enter {
ensure!(!votes_from_disabled_were_dropped, Error::<T>::BackedByDisabled);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this error removed? is it because it was merged into a generic CandidatesFilteredDuringExecution error? i liked the specificity of the previous errors more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the original intention was to trade the specific errors for simplicity by using a catch all approach. I will look and see if we can keep it simple and have these errors specific or maybe logging these errors instead of returning them might achieve same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right. I generally added debug logs to the filtering functions called in sanitize_backed_candidates whenever a candidate is filtered and the reason why it was dropped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, checking that filtering filtered nothing at the outer most level is the most robust way to check.

polkadot/runtime/parachains/src/inclusion/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/inclusion/mod.rs Outdated Show resolved Hide resolved
@alindima
Copy link
Contributor Author

Fixed the runtime API panic caused by #64 and reran benchmarks for westend and rococo

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rococo weights seem off. Westend look good.

@alindima
Copy link
Contributor Author

yeah, the weights for rococo are way off when comparing to the previous values. I'm betting that's because they were last updated in 2021. The difference for westend shows that there isn't a significant change

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @alindima ! I couldn't help it and still had a few nits, but it is good to go!

let freed = freed_concluded
.into_iter()
.map(|(c, _hash)| (c, FreedReason::Concluded))
.chain(freed_disputed.into_iter().map(|core| (core, FreedReason::Concluded)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced here, but a third enum variant Disputed would have done no harm 😶‍🌫️ (and also no need to fix it here)

// Cores 1, 2 and 3 are being made available in this block. Propose 6 more candidates (one
// for each core) and check that the right ones are successfully backed and the old ones
// enacted.
let config = default_config();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are not even sharing initialization, why is this not a separate test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid long and dubious test names :D that's arguably a bad reason but I didn't think too much about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair.

polkadot/runtime/parachains/src/paras_inherent/tests.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @alindima

@eskimor eskimor enabled auto-merge March 21, 2024 09:47
@eskimor eskimor added this pull request to the merge queue Mar 21, 2024
Merged via the queue into master with commit 4842faf Mar 21, 2024
127 of 132 checks passed
@eskimor eskimor deleted the alindima/elastic-scaling-runtime branch March 21, 2024 10:44
dharjeezy pushed a commit to dharjeezy/polkadot-sdk that referenced this pull request Mar 24, 2024
…h#3479)

Changes needed to implement the runtime part of elastic scaling:
paritytech#3131,
paritytech#3132,
paritytech#3202

Also fixes paritytech#3675

TODOs:

- [x] storage migration
- [x] optimise process_candidates from O(N^2)
- [x] drop backable candidates which form cycles
- [x] fix unit tests
- [x] add more unit tests
- [x] check the runtime APIs which use the pending availability storage.
We need to expose all of them, see
paritytech#3576
- [x] optimise the candidate selection. we're currently picking randomly
until we satisfy the weight limit. we need to be smart about not
breaking candidate chains while being fair to all paras -
paritytech#3573

Relies on the changes made in
paritytech#3233 in terms of the
inclusion policy and the candidate ordering

---------

Signed-off-by: alindima <[email protected]>
Co-authored-by: command-bot <>
Co-authored-by: eskimor <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Aug 29, 2024
On top of #5082.

## Background

Previously, before #3479, we would
[include](https://github.com/paritytech/polkadot-sdk/blame/75074952a859f90213ea25257b71ec2189dbcfc1/polkadot/runtime/parachains/src/builder.rs#L508C12-L508C44)
the cost enacting the candidate into the cost of processing a single
bitfield.
[Now](https://github.com/paritytech/polkadot-sdk/blame/dd48544a573dd02da2082cec1dda7ce735e2e719/polkadot/runtime/parachains/src/builder.rs#L529)
it is different, although the benchmarks seems to be not-up-to date.
Including the cost of enacting a candidate into a processing a single
bitfield cost was incorrect, since we multiple that by the number of
bitfields we have. Instead, we should separate calculate the cost of
processing a single bitfield without enactment, and multiple the cost of
enactment by the actual number of processed candidates (which is limited
by the number cores, not validators).

## Bench

Previously, the weight of `enact_candidate` was calculated manually
(without a benchmark) and then neglected:
https://github.com/paritytech/polkadot-sdk/blob/dd48544a573dd02da2082cec1dda7ce735e2e719/polkadot/runtime/parachains/src/inclusion/mod.rs#L584

In this PR, we have a benchmark for it and it's based on the number of
ump and sent hrmp messages as well as whether the candidate has a
runtime upgrade (new_validation_code).
The differences from the previous attempt
paritytech/polkadot#6929 are that
* we don't include the cost of enactment into the cost of processing a
backed candidate.
The reason for it is that enactment happens not in the same block as
backing (typically the next one), since we process bitfields before
backing votes.
* we don't take into account the size of the runtime upgrade, the
benchmark weight doesn't seem to depend much on it, but rather whether
there was one or not.

Similarly to the previous attempt, we don't account for dmp messages
(fixed cost). Also we don't account properly for received hrmp messages
(hrmp_watermark) because the cost of it depends on the runtime state and
can't be statically deduced in the benchmark (unless we pass the
information about channels as benchmark u32 arguments).

The total weight cost of processing a parainherent now includes the cost
of enactment of each candidate, but we don't do filtering based on that
(because we enact after processing bitfields and making other changes to
the storage).

## Numbers

```
Reads = 7 + (0 * u) + (3 * h) + (8 * c)
Writes = 10 + (1 * u) + (3 * h) + (7 * c)
```
In addition, there is a fixed cost of a few of ms (!) per candidate. 

This might result a full block slightly overflowing its weight with 200
enacted candidates, which in turn could prevent non-mandatory
transactions from being included in a block.

Given our modest limits on max ump and hrmp messages:
```
  maxUpwardMessageNumPerCandidate: 16
  hrmpMaxMessageNumPerCandidate: 10
```
and the fact that runtime upgrades are can't happen very frequently
(`validation_upgrade_cooldown`), we might only go over the limits in
case of many disputes.

TODOs:
- [x] Fix the overweight test
- [x] Generate the weights for Westend and Rococo
- [x] PRDoc

---------

Co-authored-by: command-bot <>
Co-authored-by: Alin Dima <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
4 participants