Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Refactoring Parachain Consensus in Cumulus #2301

Closed
rphmeier opened this issue Mar 9, 2023 · 13 comments · Fixed by #2382
Closed

Refactoring Parachain Consensus in Cumulus #2301

rphmeier opened this issue Mar 9, 2023 · 13 comments · Fixed by #2382
Labels
I7-refactor Code needs refactoring.

Comments

@rphmeier
Copy link
Contributor

rphmeier commented Mar 9, 2023

Proposing a refactor for Cumulus' consensus code.

Requirements

The main issue with Cumulus’ current architecture is the ownership relationship between the collator and the parachain consensus code. The architecture currently has a Collator own a T: ParachainConsensus, which it invokes internally with a build_collation function. This is backwards: consensus should be at the top, as it may need to be highly specialized per parachain - especially as blockspace offerings such as on-demand parachains and elastic scaling evolve.

Collating takes vastly different inputs as it’s only required for liveness and needs to interface with external blockspace markets. Examples of these inputs:

  • Notifications of new relay-chain heads.
  • New pending transactions
  • Time
  • New parachain blocks

Things we should intend to support:

Proposal

The general idea is that each collator should spawn a background worker whose responsibility it is to actually create collations and share them in the network. We also need an import queue, just like Substrate, although we have a requirement for parachains which doesn’t exist in vanilla Substrate: the parachain runtime is responsible for verifying blocks’ seals. This is required in parachains, because the Polkadot validators need to check the seal as well.

The worker should have access to a import_and_share_collation to actually import the block locally and share this collation to the network. Separation of creating a collation and sharing it with the network is important, because it may be that a collation requires a quorum of collators. In that case, we need to create the collation, then collect a seal through a network protocol, then amend the collation, and only then share it.

Submitting blockspace orders or bids can be done in the same worker or in a separate background task. The general framework doesn't need to support this yet, but we should write this code (#2154)

We should also not remove any current client/collator or aura code, as it is depended on by outwards users. Instead, this should be built alongside, in a backwards-compatible way, giving users time to adjust their code.

Aura

The new framework should not be merged until a fully backwards-compatible "aura" implementation is done alongside it, which can be swapped in by downstream users. This would come with a deprecation, not a deletion, of the old code for doing this.

Actually rewriting Aura logic is not ideal, so it’d be better to still wrap the SlotWorker as is currently done, even though it was not originally designed for this use-case. To do this, we need to modularize the simple_slot_worker a bit more. fn on_slot currently does these things
1. Calculate remaining proposal duration
2. Slot claiming (including skipping if certain criteria are not met)
3. Creating a proposal + storage proof
4. Sealing the block with a digest
5. Importing the block

To be clear, the slot worker already has a lot of these things as individual functions, but on_slot still does a bunch of the work that we'd like to split out. Especially separating (3), (4), and (5).

These should be split out into separate helper functions, to the extent that they aren’t already. For basic aura, for instance, the worker logic should detect a slot (not based on real time, as is currently done), and then: compute (1) outside, then call into the aura slot worker for (2), (3), and (4), and then handle (5) itself alongside sharing to the network.

As for import queues: because of the “runtime needs to check seals” property, we can get away with simpler import queues that only do basic checks, like making sure a parablock’s slot isn’t in the future. These too should be customizable at the top level of Cumulus. For migrations to new consensus algorithms, the verifier should also be able to check the runtime API or version of the block, to internally delegate verification to different internal verifiers.

@rphmeier rphmeier added the I7-refactor Code needs refactoring. label Mar 9, 2023
@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 9, 2023

BTW, I think that eventually we should look into getting an Orchestra implementation in for Cumulus, as it'd help coordinate stuff like collating, ordering blockspace, submitting collations to the network, etc.

@bkchr
Copy link
Member

bkchr commented Mar 10, 2023

Thank you for writing this! Much better than mine :D

We also need an import queue, just like Substrate, although we have a requirement for parachains which doesn’t exist in vanilla Substrate: the parachain runtime is responsible for verifying blocks’ seals. This is required in parachains, because the Polkadot validators need to check the seal as well.

I created an issue for some time ago in Substrate: paritytech/polkadot-sdk#63. While implementing the stuff in Cumulus, the Seal checking on the native side didn't made any sense to me. We also had multiple times the problem that people don't know that they first need to remove the Seal before executing the block. Let's move this stuff to the runtime!

We should also not remove any current client/collator

This is mainly some internal implementation detail. We can probably get rid off it without make the life of anyone harder.

Actually rewriting Aura logic is not ideal, so it’d be better to still wrap the SlotWorker as is currently done, even though it was not originally designed for this use-case. To do this, we need to modularize the simple_slot_worker a bit more. fn on_slot currently does these things

TBH, the current implementation is too complicated and too opinionated. Slot worker is doing to much "magic" and should be simplified. Maybe providing some sort of "generic consensus implementation" that you can stick together as you need it. Nevertheless, we could provide some opinionated easy to use implementation that will then use the generic implementation internally. I think @davxy and @andresilva can help here. Stuff like "force authoring" is for example something that we should get rid off!

The new framework should not be merged until a fully backwards-compatible "aura" implementation is done alongside it, which can be swapped in by downstream users. This would come with a deprecation, not a deletion, of the old code for doing this.

All in all I would be very much in favor of creating a generic consensus implementation, especially in the runtime where you can plug in stuff like author selection etc. Similar to how nimbus is doing it. In the end all of the consensus implementations share a lot of common patterns and only differ in small details like "how to select the next author". We could then phase out the old AURA/BABE implementation if this is finished.

BTW, I think that eventually we should look into getting an Orchestra implementation in for Cumulus, as it'd help coordinate stuff like collating, ordering blockspace, submitting collations to the network, etc.

Sounds good, but we should please ensure that we don't block any subsystem "main thread" on waiting messages from other susbsystems etc. I know that this makes it more complicated to write certain things, but it also ensures that we don't run into deadlocks etc.

All in all this sounds like it will take quite a lot of effort to achieve this. Personally I would propose that we start with a reduced set of requirements to get something out for asynchronous backing. Stuff like elastic scaling etc will still take quite some time to get it working properly. We should first start small, but already keep the "grand goal" in our minds.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 10, 2023

TBH, the current implementation is too complicated and too opinionated. Slot worker is doing to much "magic" and should be simplified. Maybe providing some sort of "generic consensus implementation" that you can stick together as you need it. Similar to how nimbus is doing it. In the end all of the consensus implementations share a lot of common patterns and only differ in small details like "how to select the next author". We could then phase out the old AURA/BABE implementation if this is finished.

After trying to work on pluggable or modular consensus implementations a few times since 2016 (parity-ethereum, Substrate), I don't believe this is something that can really be done. Consensus protocols have so many different properties that the best I believe we can do is just simplify them down to background workers. They often require specific networking or topologies or other tooling that needs to be built alongside them, at least to make them fast. They have different definitions of when something is finalized, timeouts when authors don't show up, even block formats, fork-choice rules, etc..

So if I understand the rest of your comment - it sounds like you actually suggest not to use the slot worker but instead reimplement Aura from scratch with a Cumulus-specific implementation, because the slot worker does too many things. I think that's reasonable to do but I'm not convinced that Aura is actually a good fit for parachains, given all of the things we want to support in the future, and as a result I don't want to spend too much time on Aura until we have a clear direction for how it fits in to the future picture.

That's why I propose this refactoring, which focuses on the control flow of Cumulus and porting the same Aura implementation over in a backwards-compatible way, with hopefully minimal changes in Substrate.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 10, 2023

@burdges

Responding to a comment in paritytech/substrate#2267 which is now closed.

What does "multiple slots per author" mean?

I mean authors being assigned multiple slots in a row deliberately (or alternatively, being allowed to produce multiple blocks per slot), which is necessary to fully utilize the throughput of asynchronous backing as well as elastic scaling, in the future. Sometimes there is an opportunity to build 2 or 3 blocks instead of just one, and to build a bit of a backlog. This gets complex quickly, but maybe you have some ideas. Aura or slot-based consensus in general don't seem highly compatible with this approach. The main issue is that if you have a single leader, they will tend to make as many blocks as they can in order to get block and transaction rewards, causing unnecessary forks when there are opportunities for multiple validators to make blocks at the same time. But if you have a quorum, then the majority will outvote the greedy leader.

At present, we tolerate a colluding majority of collators, so long as they do not freeze out other collators, but in principle we could tolerate a malicious majority if honest ones run reconstruction. It'll keep our lives simpler if we preserve this.

Yes, that's true, and is an important weakness of quorum-based consensus. It has some upsides too - fast "finality" on parachains with low economic security (some use-cases care about this - it's nice to know that someone will get slashed, even for $20k, when you're selling coffee for tokens). I also think it'd be easier for things like asynchronous backing / elastic scaling to design a quorum-based mechanism that decides repeatedly on both who the leader is at the moment and how many parablocks they should make (0-N).

IMO in Cumulus we should support both use-cases.

@burdges
Copy link

burdges commented Mar 10, 2023

It's multiple slots but it's almost like one slot but multiple cores then, so your remark about 2d slots numbers now makes sense. We've no predefined map between slots and block heights right now, so you could've an actual major/slot number for aura or sassafras, but also a minor/sequence number, and then also the block height.

As an aside, it'll be amusing when someone make a flash loan, which operates within one slot but across sequence numbers, given they come from the same collator, but then someone else figures out how to fork out the last sequence number,and steal the loan. ;)

Now what is this quorum? Do you want multiple collators to collaborate to build compatible blocks within the same slot?

We could be weakly optimistic that other collators behave as desired, like maybe they commit to their transaction sequence before the finish building the block. This is still sequential multi-core.

We could divide the state instead of the time among the collators: Instead of sequencing the block, the collators negotiate about who includes which transactions, and they divide the state into global read and local write portions. It's vaguely like ownership in Rust or the multi-threading memory models discussion. This is parallel multi-core.

At first blush, these are less consensus questions and more chain data model concerns, so they'll impact what specific parachains do well. We should figure out if DAG folks know anything cool.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 24, 2023

Now what is this quorum? Do you want multiple collators to collaborate to build compatible blocks within the same slot?

Yeah, I think this is worth supporting. The benefit is for chains that want to have some level of "finality" for parachains, i.e. know that a particular block will eventually get included in the relay chain, or none at all, at some low level of security. it's a use-case we hear about fairly often, and should support.

There are maybe other collator protocols worth implementing, such as HotShot from Espresso Systems. This scales to a large number of collators, but utilizes a cryptographic sortition like Algorand to select a subcomittee at any given point. It uses the pipelining approach from Chained HotStuff to be more efficient. I think this would match well with elastic scaling approaches in the future and recommend an implementation in Cumulus, as it should be compatible with large open collator sets, Substrate's Orchestra extensions, asynchronous backing, and elastic scaling.

@burdges
Copy link

burdges commented Mar 25, 2023

If the parachain runs sassafras, then you could've multiple tickets assigned to each slot, so then those collators would collaborate over tor or something. We'll avoid memepools in this way, which matters eventually.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 25, 2023

@burdges Can you write something up to describe how sassafras could be adapted to support elastic scaling & dynamically change the block rate?

I think that any slot-based algorithm has a weakness in incentives, which maybe should be researched by @AlfonsoCev. The issue I see is that the relay-chain doesn't enforce any particular fork-choice rule for parachains, so while the best thing for the parachain is to build on the longest chain, individual block authors can decide to "pre-empt" the longest chain by forking against an earlier block, and then this fork is likely to get included. When you have a quorum to determine how many blocks are meant to be built at any point in time and against which headers, the incentives of individual proposers are outmatched by the incentives of the entire collator set.

@burdges
Copy link

burdges commented Mar 27, 2023

Why quorum though? Instead, your paracain could've some on-chain rule that determines its upcoming block rate, no? The rule might simply be: Someone paid for us to make more blocks. Yes?

That's probably a good idea. At a high level..

Relay chain sassafras is largely about both avoiding forks and improving randomness, so we want it to be hard to have a slot at the beginning or end of an epoch.

Parachain sassafras is about avoiding forks and avoiding memepools, but not really about improving randomness, so it cares less about the sort order at the ends of epochs. You could therefore just schedule a secondary, tertiary, etc. tier of block production slots, which still might all form a chain, ala .. <- (n-1,3rd) <- (n,1st) <- (n,2nd) <- (n,3rd) <- (n+1,1st) <- ..

In this, we're assuming parachains can get good randomness from the relay chain. In fact, async backing weakens the natural randomness slightly, but we know a zoo of fixes for when parachains want better randomness.

As an aside, we often have small enough collator sets for parachains that if they do not care about features which benefit from avoiding memepools then they could employ a anonymized shuffle based schemes. These also produce a first second, etc. tier of block producers, but you donno your assignment until shortly before making your block.

@rphmeier
Copy link
Contributor Author

rphmeier commented Mar 27, 2023

Right, we could also add on-chain code for the parachain to look up scheduling for itself in the relay-chain state and allow for slot acceleration as a result. @eskimor I suppose we could look up the claim queue in order to do that, but it'd be nice for it to be cheap.

The main argument for quorum-based approaches is that parachain fork-choice rules are nebulous at best - the relay chain has no good way of interpreting competing branches of a parachain, so 'worse' forks are able to pre-empty better ones unless we either find a generic and cheap way for relay chain validators to evaluate parachain fork-choice rules or sidestep the issue. This weakness in enforcing fork-choice makes the incentives or even expected behavior harder to reason about.

@rphmeier
Copy link
Contributor Author

rphmeier commented Apr 1, 2023

About InherentData and check_inherents:

InherentData is used for two purposes: the first is to author blocks with pluggable inherent extrinsics created by user-supplied code. The second is to check that newly imported blocks have inherents that match the local node's view of things.

The second purpose makes more sense within vanilla Substrate than Cumulus. Full nodes rejecting blocks mean that those blocks truly won't be built upon. However, in Cumulus, all these checks need to be performed within the runtime - if the PVF accepts a block that full nodes reject, this will cause issues. execute_block for Cumulus runtimes should perform all necessary verification of inherents. This limits the types of inherents that can be supported, yes, but that is because it's impossible in this execution environment. The PVF is the ultimate authority on whether blocks are good or not.

With that in mind, there is no reason for Cumulus full nodes to do any checking of inherents against InherentData beyond what is performed by the runtime. This is currently done explicitly in

.check_inherents(*block.header().parent_hash(), block.clone(), inherent_data)
and implicitly within the Aura import queue.

I propose that we should remove all usages of InherentData outside of block building itself, and limit even that to InherentData which is related to the user's pallets as opposed to the ParachainInherentData, which needs to be constructed with more contextual information from the collator engine implementation.

Specifically, we should actually just have one ImportQueue implementation for all Cumulus chains, regardless of consensus algorithm - the runtime needs to check that consensus is done correctly, so the client-side code should be exactly the same. This ImportQueue should do ~nothing - the most it should reasonably do is issue a runtime API call to check_inherents_heres_the_block_and_no_other_context in case vanilla execute_block doesn't check inherents, but the PVF logic does

@rphmeier
Copy link
Contributor Author

rphmeier commented Apr 6, 2023

Related: it doesn't seem that any import queues currently invoke ParachainsInherentData::create_at. I guess it's not necessary?

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/cumulus-consensus-modules/2896/1

bkontur added a commit that referenced this issue Aug 10, 2023
278119fec2 Grandpa: Store the authority set changes (#2336) (#2337)
19f9c8ffdb remove message sender origin (#2322)
3c7c271d2e GRANDPA module: store accepted justifications (#2298) (#2301)
fb7d12e793 GRANDPA justifications: equivocation detection primitives (#2295) (#2297)
d03a2ed450 More backports from Cumulus subtree to polkadot-staging (#2283)
3c4ada921b Update dependecies (#2277) (#2281)
3e195c9e76 GRANDPA: optimize votes_ancestries when needed (#2262) (#2264)
7065bbabc6 Implement RuntimeDebug for GrandpaJustification (#2254)
8c9e59bcbc Define generate_grandpa_key_ownership_proof() (#2247) (#2248)
0b46956df7 Deduplicate Grandpa consensus log reading logic (#2245) (#2246)
96c9701710 Fix deps from Cumulus (#2244)

git-subtree-dir: bridges
git-subtree-split: 278119fec2b45990cf1271999b0c21befe7003d9
paritytech-processbot bot pushed a commit that referenced this issue Aug 10, 2023
* Squashed 'bridges/' changes from 0417308a48..278119fec2

278119fec2 Grandpa: Store the authority set changes (#2336) (#2337)
19f9c8ffdb remove message sender origin (#2322)
3c7c271d2e GRANDPA module: store accepted justifications (#2298) (#2301)
fb7d12e793 GRANDPA justifications: equivocation detection primitives (#2295) (#2297)
d03a2ed450 More backports from Cumulus subtree to polkadot-staging (#2283)
3c4ada921b Update dependecies (#2277) (#2281)
3e195c9e76 GRANDPA: optimize votes_ancestries when needed (#2262) (#2264)
7065bbabc6 Implement RuntimeDebug for GrandpaJustification (#2254)
8c9e59bcbc Define generate_grandpa_key_ownership_proof() (#2247) (#2248)
0b46956df7 Deduplicate Grandpa consensus log reading logic (#2245) (#2246)
96c9701710 Fix deps from Cumulus (#2244)

git-subtree-dir: bridges
git-subtree-split: 278119fec2b45990cf1271999b0c21befe7003d9

* Subtree update

* Squashed 'bridges/' changes from 278119fec2..edf33a2c85

edf33a2c85 Backport fix (for wasm `std` env) (#2339)

git-subtree-dir: bridges
git-subtree-split: edf33a2c85399d366e008228f2d9e63e8a492d95
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I7-refactor Code needs refactoring.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants