prospective-parachains rework #4035

alindima · 2024-04-09T06:53:35Z

Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't know the parent candidate yet). Needed for elastic scaling: #3541. Without this, candidate B will not be validated and backed until candidate A (its parent) is validated and a backing statement reaches the validator.

Due to the high complexity of the subsystem, I rewrote parts of it so that we don't concern ourselves with candidates which form cycles or which form parachain forks. We now have "Fragment chains" instead of "Fragment trees". This greatly simplifies some of the code and is a compromise we can make. We just need to make sure that cycle-producing parachains don't brick the relay chain and that fork-producing parachains can still make some progress (on one core at least). The only forks that are allowed are those on the relay chain, obviously.

Unconnected candidates are kept in the CandidateStorage and whenever a new candidate is introduced, we try to repopulate the chain with as many candidates as we can.

Also fixes #3219

Guide changes will be done as part of: #3699

TODOs:

see if we can replace the Cow over the candidate commitments with an Arc over the entire ProspectiveCandidate. It's only being overwritten in unit tests. We can work around that.
finish fragment_chain unit tests
add more prospective-parachains subsystem tests
test with zombienet what happens if a parachain is creating cycles (it should not brick the relay chain).
test with zombienet a parachain that is creating forks. it should keep producing blocks from time to time (one bad collator should not DOS the parachain, even if throughput decreases)
add some more logs and metrics
add prdoc and remove the "silent" label

- replace fragment trees with fragment chains - remove explicit depth tracking in backing subsystem - remove FragmentTreeMembership message - forbid cycles

…e-parachains-unconnected

Signed-off-by: Andrei Sandu <[email protected]>

…e-parachains-unconnected

sandreim

Awesome work here and testing on Versi @alindima!

Just have some more nits and questions, but anyway good to go.

polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs

sandreim · 2024-05-10T12:14:34Z

polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs

+
+		if (self.chain.len() + unconnected) < self.scope.max_depth {
+			PotentialAddition::Anyhow
+		} else if (self.chain.len() + unconnected) == self.scope.max_depth {


Why if we are already at max_depth we can add one more ?

that's a good question.

Even prior to this PR, we were accepting max_depth + 1 unincluded candidates at all times.
Not sure why it was designed this way. I assume it's because a max_candidate_depth of 0 is equal to synchronous backing and would still allow for one candidate to be backed (therefore the +1 is needed)

I see that makes some sense indeed.

The question is that now we have async backing enabled do we still want to keep this ?

The question is that now we have async backing enabled do we still want to keep this ?

I don't have a good reason for chaning this. Moreover, we seem to have discovered that we need a max_depth + 1 unincluded segment size also on the collator. So maybe it's here for a reason.
Nevertheless, I'd leave it as is unless we find a good reason to change it

polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs

sandreim · 2024-05-10T12:19:02Z

polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs

+		let Some(relay_parent) = self.scope.ancestor(relay_parent) else { return false };
+
+		if relay_parent.number < earliest_rp.number {
+			return false // relay parent moved backwards.


Does it make sense to log trace here just in case for debugging ?

I don't think it helps, because this function is also called when pruning the candidate storage on a new leaf update.
therefore, it's bound to happen most of the times under normal circumstances

polkadot/node/core/prospective-parachains/src/metrics.rs

sandreim · 2024-05-10T12:24:12Z

polkadot/node/network/collator-protocol/src/validator_side/collation.rs

+/// access to the parent's HeadData. Can be retried once the candidate outputting this head data is
+/// seconded.
+#[derive(Debug, Clone, Eq, PartialEq, Hash)]
+pub struct BlockedCollationId {


Is this really an ID ? Just BlockedCollation maybe ?

it's only the key used in the HashMap, hence it's an ID. It maps to a fetched collation

sandreim · 2024-05-10T12:32:52Z

prdoc/pr_4035.prdoc

+  - audience: Node Dev
+    description: |
+      Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't yet know
+      the parent candidate). Needed for elastic scaling. Also simplifies it to not allow parachain forks and cycles.


I'd first mention that this fundamentally changes what information the subsystem stores and operates on. From a tree to just a chain and bunch of unconnected candidates.

reworked the prdoc, let me know if it sounds better now

…e-parachains-unconnected

alexggh

Good job! Left some nits, couldn't find any problems within it.

polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs

alexggh · 2024-05-13T11:37:39Z

polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs

+				if self.is_fork_or_cycle(
+					candidate.parent_head_data_hash(),
+					Some(candidate.output_head_data_hash()),
+				) {
 					continue


We don't expect this to happen, right ? So let's log a debug if it happens.

well it can happen if there's a parachain fork or cycle on another relay chain fork. the candidate storage holds all candidates across all relay parents. so when populating the chain, we may attempt to introduce a candidate that is present on another relay chain fork (but those candidates will probably not pass these sanity checks). but it may not neccesarily mean something's wrong

polkadot/node/core/prospective-parachains/src/lib.rs

tdimitrov

Excellent work @alindima

…e-parachains-unconnected

* master: improve MockValidationDataInherentDataProvider to support async backing (#4442) Bump `proc-macro-crate` to the latest version (#4409) [ci] Run check-runtime-migration in GHA (#4441) prospective-parachains rework (#4035) [ci] Add forklift to GHA ARC (#4372) `CheckWeight` SE: Check for extrinsic length + proof size combined (#4326) Add generate and verify logic for `AncestryProof` (#4430) Rococo AH: undeploy trie migration (#4414) Remove `substrate-frame-cli` (#4403) migrations: `take()`should consume read and write operation weight (#4302) `remote-externalities`: store block header in snapshot (#4349) xcm-emlator: Use `BlockNumberFor` instead of `parachains_common::BlockNumber=u32` (#4434) Remove `pallet::getter` usage from authority-discovery pallet (#4091) Remove pallet::getter usage from pallet-contracts-mock-network (#4417) Add docs to request_core_count (#4423)

kamilsa · 2024-05-23T09:19:50Z

Are fragment chains backwards compatible with fragment trees? Should we do the corresponding updates in KAGOME ASAP?

alindima · 2024-05-23T10:54:18Z

Fragment trees would maintain parachain forks of backable candidates. In the end, the provisioner subsystem on the block author would only choose the first one to back.

Now, fragment chains enable validators to not maintain parachain forks. If the block author gets two backable candidates that have the same parent, it will not even fetch/store the second one.

We make the assumption that parachains shouldn't be creating forks (often or ever) and if they do, they won't utilise the full throughput.

This shouldn't break compatibility with nodes that still hold fragment trees. We don't expect every validator to use this implementation rightaway.

That being said, this PR is more than a switch from fragment-trees to fragment chains. It enables parallel validation of candidates of the same para across different backing groups (needed for elastic scaling).
Whether or not kagome should implement this ASAP depends on your goals, but it shoulnd't break compatibility.

Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't know the parent candidate yet). Needed for elastic scaling: paritytech#3541. Without this, candidate B will not be validated and backed until candidate A (its parent) is validated and a backing statement reaches the validator. Due to the high complexity of the subsystem, I rewrote parts of it so that we don't concern ourselves with candidates which form cycles or which form parachain forks. We now have "Fragment chains" instead of "Fragment trees". This greatly simplifies some of the code and is a compromise we can make. We just need to make sure that cycle-producing parachains don't brick the relay chain and that fork-producing parachains can still make some progress (on one core at least). The only forks that are allowed are those on the relay chain, obviously. Unconnected candidates are kept in the `CandidateStorage` and whenever a new candidate is introduced, we try to repopulate the chain with as many candidates as we can. Also fixes paritytech#3219 Guide changes will be done as part of: paritytech#3699 TODOs: - [x] see if we can replace the `Cow` over the candidate commitments with an `Arc` over the entire `ProspectiveCandidate`. It's only being overwritten in unit tests. We can work around that. - [x] finish fragment_chain unit tests - [x] add more prospective-parachains subsystem tests - [x] test with zombienet what happens if a parachain is creating cycles (it should not brick the relay chain). - [x] test with zombienet a parachain that is creating forks. it should keep producing blocks from time to time (one bad collator should not DOS the parachain, even if throughput decreases) - [x] add some more logs and metrics - [x] add prdoc and remove the "silent" label --------- Signed-off-by: Andrei Sandu <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>

cryptokin92

Good update

Makes paritytech#4035 easier to review

Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't know the parent candidate yet). Needed for elastic scaling: paritytech#3541. Without this, candidate B will not be validated and backed until candidate A (its parent) is validated and a backing statement reaches the validator. Due to the high complexity of the subsystem, I rewrote parts of it so that we don't concern ourselves with candidates which form cycles or which form parachain forks. We now have "Fragment chains" instead of "Fragment trees". This greatly simplifies some of the code and is a compromise we can make. We just need to make sure that cycle-producing parachains don't brick the relay chain and that fork-producing parachains can still make some progress (on one core at least). The only forks that are allowed are those on the relay chain, obviously. Unconnected candidates are kept in the `CandidateStorage` and whenever a new candidate is introduced, we try to repopulate the chain with as many candidates as we can. Also fixes paritytech#3219 Guide changes will be done as part of: paritytech#3699 TODOs: - [x] see if we can replace the `Cow` over the candidate commitments with an `Arc` over the entire `ProspectiveCandidate`. It's only being overwritten in unit tests. We can work around that. - [x] finish fragment_chain unit tests - [x] add more prospective-parachains subsystem tests - [x] test with zombienet what happens if a parachain is creating cycles (it should not brick the relay chain). - [x] test with zombienet a parachain that is creating forks. it should keep producing blocks from time to time (one bad collator should not DOS the parachain, even if throughput decreases) - [x] add some more logs and metrics - [x] add prdoc and remove the "silent" label --------- Signed-off-by: Andrei Sandu <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>

Resolves #4800 # Problem In #4035, we removed support for parachain forks and cycles and added support for backing unconnected candidates (candidates for which we don't yet know the full path to the latest included block), which is useful for elastic scaling (parachains using multiple cores). Removing support for backing forks turned out to be a bad idea, as there are legitimate cases for a parachain to fork (if they have other consensus mechanism for example, like BABE or PoW). This leads to validators getting lower backing rewards (depending on whether they back the winning fork or not) and a higher pressure on only the half of the backing group (during availability-distribution for example). Since we don't yet have approval voting rewards, backing rewards are a pretty big deal (which may change in the future). # Description A backing group is now allowed to back forks. Once a candidate becomes backed (has the minimum backing votes), we don't accept new forks unless they adhere to the new fork selection rule (have a lower candidate hash). This helps with keeping the implementation simpler, since forks will only be taken into account for candidates which are not backed yet (only seconded). Having this fork selection rule also helps with reducing the work backing validators need to do, since they have a shared way of picking the winning fork. Once they see a candidate backed, they can all decide to back a fork and not accept new ones. But they still accept new ones during the seconding phase (until the backing quorum is reached). Therefore, a block author which is not part of the backing group will likely not even see the forks (only the winning one). Just as before, a parachain producing forks will still not be able to leverage elastic scaling but will still work with a single core. Also, cycles are still not accepted. ## Some implementation details `CandidateStorage` is no longer a subsystem-wide construct. It was previously holding candidates from all relay chain forks and complicated the code. Each fragment chain now holds their candidate chain and their potential candidates. This should not increase the storage consumption since the heavy candidate data is already wrapped in an Arc and shared. It however allows for great simplifications and increase readability. `FragmentChain`s are now only creating a chain with backed candidates and the fork selection rule. As said before, `FragmentChain`s are now also responsible for maintaining their own potential candidate storage. Since we no longer have the subsytem-wide `CandidateStorage`, when getting a new leaf update, we use the storage of our latest ancestor, which may contain candidates seconded/backed that are still in scope. When a candidate is backed, the fragment chains which hold it are recreated (due to the fork selection rule, it could trigger a "reorg" of the fragment chain). I generally tried to simplify the subsystem and not introduce unneccessary optimisations that would otherwise complicate the code and not gain us much (fragment chains wouldn't realistically ever hold many candidates) TODO: - [x] update metrics - [x] update docs and comments - [x] fix and add unit tests - [x] tested with fork-producing parachain - [x] tested with cycle-producing parachain - [x] versi test - [x] prdoc

Resolves #4800 In #4035, we removed support for parachain forks and cycles and added support for backing unconnected candidates (candidates for which we don't yet know the full path to the latest included block), which is useful for elastic scaling (parachains using multiple cores). Removing support for backing forks turned out to be a bad idea, as there are legitimate cases for a parachain to fork (if they have other consensus mechanism for example, like BABE or PoW). This leads to validators getting lower backing rewards (depending on whether they back the winning fork or not) and a higher pressure on only the half of the backing group (during availability-distribution for example). Since we don't yet have approval voting rewards, backing rewards are a pretty big deal (which may change in the future). A backing group is now allowed to back forks. Once a candidate becomes backed (has the minimum backing votes), we don't accept new forks unless they adhere to the new fork selection rule (have a lower candidate hash). This helps with keeping the implementation simpler, since forks will only be taken into account for candidates which are not backed yet (only seconded). Having this fork selection rule also helps with reducing the work backing validators need to do, since they have a shared way of picking the winning fork. Once they see a candidate backed, they can all decide to back a fork and not accept new ones. But they still accept new ones during the seconding phase (until the backing quorum is reached). Therefore, a block author which is not part of the backing group will likely not even see the forks (only the winning one). Just as before, a parachain producing forks will still not be able to leverage elastic scaling but will still work with a single core. Also, cycles are still not accepted. `CandidateStorage` is no longer a subsystem-wide construct. It was previously holding candidates from all relay chain forks and complicated the code. Each fragment chain now holds their candidate chain and their potential candidates. This should not increase the storage consumption since the heavy candidate data is already wrapped in an Arc and shared. It however allows for great simplifications and increase readability. `FragmentChain`s are now only creating a chain with backed candidates and the fork selection rule. As said before, `FragmentChain`s are now also responsible for maintaining their own potential candidate storage. Since we no longer have the subsytem-wide `CandidateStorage`, when getting a new leaf update, we use the storage of our latest ancestor, which may contain candidates seconded/backed that are still in scope. When a candidate is backed, the fragment chains which hold it are recreated (due to the fork selection rule, it could trigger a "reorg" of the fragment chain). I generally tried to simplify the subsystem and not introduce unneccessary optimisations that would otherwise complicate the code and not gain us much (fragment chains wouldn't realistically ever hold many candidates) TODO: - [x] update metrics - [x] update docs and comments - [x] fix and add unit tests - [x] tested with fork-producing parachain - [x] tested with cycle-producing parachain - [x] versi test - [x] prdoc

alindima and others added 13 commits March 28, 2024 10:08

unify and simplify IntroduceCandidate and CandidateSeconded

0ce57cf

WIP

051fbe6

- replace fragment trees with fragment chains - remove explicit depth tracking in backing subsystem - remove FragmentTreeMembership message - forbid cycles

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

d6f978d

…e-parachains-unconnected

more code

0fe5e34

add a fix and some debug logs

eb3b252

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

e731bdb

…e-parachains-unconnected

add proper fork and cycle check

b402936

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

57bf39e

…e-parachains-unconnected

test out of order manifests

36ba647

Signed-off-by: Andrei Sandu <[email protected]>

rename fragment trees to fragment chains

b0a8437

debugging

ea08d35

fixes

5086f3d

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

7ac60d3

…e-parachains-unconnected

alindima marked this pull request as draft April 9, 2024 06:53

alindima added 4 commits April 10, 2024 10:09

comment out rococo claimqueue runtime API

0d0a601

simplify GetHypotheticalMembership

de63e08

refactoring

8b4d57c

more refactoring

5c9e3cd

alindima added the T8-polkadot This PR/Issue is related to/affects the Polkadot network. label Apr 10, 2024

alindima added 11 commits April 10, 2024 12:39

some more refactoring

575e3f7

update some comments

8fbe25b

re-enable claimqueue on rococo

a6ea197

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

83ef424

…e-parachains-unconnected

start updating some doc comments

1505f0f

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

2c3968b

…e-parachains-unconnected

remove blocked advertisement handling

638d8c4

update some more comments

87a1cdc

fix statement-distribution tests

d1321b2

fix backing and collator-protocol tests

07ee903

fix and comment some prospective-parachains test

597a801

sandreim approved these changes May 10, 2024

View reviewed changes

alindima added 3 commits May 10, 2024 17:33

some review comments

85fe3e4

unify metrics

d50d7b2

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

67ff035

…e-parachains-unconnected

alexggh approved these changes May 13, 2024

View reviewed changes

review comms

7621912

tdimitrov approved these changes May 13, 2024

View reviewed changes

Merge remote-tracking branch 'origin/master' into alindima/prospectiv…

6a7d08a

…e-parachains-unconnected

alindima enabled auto-merge May 13, 2024 13:04

alindima added this pull request to the merge queue May 13, 2024

Merged via the queue into master with commit d36da12 May 13, 2024
146 of 148 checks passed

alindima deleted the alindima/prospective-parachains-unconnected branch May 13, 2024 15:09

alexggh mentioned this pull request Jun 18, 2024

Poor PV performance (missed votes) with v1.12.0 and v1.13.0 #4800

Closed

2 tasks

alindima mentioned this pull request Jul 4, 2024

prospective-parachains rework: take II #4937

Merged

7 tasks

cryptokin92 reviewed Jul 24, 2024

View reviewed changes

TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this pull request Aug 2, 2024

rename fragment_tree folder to fragment_chain (paritytech#4294)

0c107bf

Makes paritytech#4035 easier to review

This was referenced Aug 21, 2024

Update polkadot-sdk from v1.11.0 to stable2407 moondance-labs/tanssi#659

Open

Update polkadot-sdk from v1.11.0 to stable2407 moonbeam-foundation/moonbeam#2912

Closed

This was referenced Oct 4, 2024

feat: ⏫ upgrade to Polkadot SDK v1.13.0 Moonsong-Labs/storage-hub#214

Closed

feat: ⏫ upgrade to Polkadot SDK v1.13.0 Moonsong-Labs/storage-hub#216

Merged

redzsina assigned sandreim Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prospective-parachains rework #4035

prospective-parachains rework #4035

alindima commented Apr 9, 2024 •

edited

Loading

sandreim left a comment

sandreim May 10, 2024

alindima May 10, 2024

sandreim May 10, 2024

sandreim May 10, 2024

alindima May 10, 2024

sandreim May 10, 2024

alindima May 10, 2024

sandreim May 10, 2024

alindima May 10, 2024

sandreim May 10, 2024

alindima May 10, 2024

alexggh left a comment

alexggh May 13, 2024

alindima May 13, 2024

tdimitrov left a comment

kamilsa commented May 23, 2024 •

edited

Loading

alindima commented May 23, 2024

cryptokin92 left a comment

prospective-parachains rework #4035

prospective-parachains rework #4035

Conversation

alindima commented Apr 9, 2024 • edited Loading

sandreim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexggh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdimitrov left a comment

Choose a reason for hiding this comment

kamilsa commented May 23, 2024 • edited Loading

alindima commented May 23, 2024

cryptokin92 left a comment

Choose a reason for hiding this comment

alindima commented Apr 9, 2024 •

edited

Loading

kamilsa commented May 23, 2024 •

edited

Loading