Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Implementer's guide: Approval Voting Subsystem #1691

Merged
merged 31 commits into from
Oct 31, 2020
Merged

Conversation

rphmeier
Copy link
Contributor

@rphmeier rphmeier commented Sep 9, 2020

Closes #1601

This subsystem is responsible for 3 things:

  • Issuing and tracking assignments to participate in approval checks for different candidates in blocks. It cleanly handles forks, as the same candidate may appear in two different competing relay-chain blocks but the assignment selection criteria are based on the relay-chain block hash.
  • Issuing and tracking approval votes on different candidates in blocks. This involves invoking the availability recovery subsystem and the candidate validation subsystem
  • Providing an API to a custom GRANDPA voting rule that allows it to select the best block which has all candidates approved. This will allow us to have fast finality after approval checks complete.

@rphmeier rphmeier added the S0-design PR/Issue is in the design stage label Sep 9, 2020
@rphmeier rphmeier changed the title Implementer's guide: Approvals Runtime Implementer's guide: Approvals Subsystems Oct 23, 2020
@rphmeier rphmeier changed the title Implementer's guide: Approvals Subsystems Implementer's guide: Approval Voting Subsystem Oct 28, 2020
Copy link
Contributor

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, a few minor elaboration requests, but overall this looks very nice!

@rphmeier rphmeier merged commit 12b3027 into master Oct 31, 2020
@rphmeier rphmeier deleted the rh-approval-subsystems branch October 31, 2020 03:07
@@ -10,7 +10,14 @@ The architecture of the node-side behavior aims to embody the Rust principles of

Many operations that need to be carried out involve the network, which is asynchronous. This asynchrony affects all core subsystems that rely on the network as well. The approach of hierarchical state machines is well-suited to this kind of environment.

We introduce a hierarchy of state machines consisting of an overseer supervising subsystems, where Subsystems can contain their own internal hierarchy of jobs. This is elaborated on in the next section on Subsystems.
We introduce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence?

tranche: DelayTranche,
// assigned validators who have not yet approved, and the instant we received
// their assignment.
assignments: Vec<(ValidatorIndex, Tick)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. :) It took me quite a while to figure out that we needed this Tick, which corresponds to AssignmentSignature::recieved in https://github.com/paritytech/polkadot/pull/1558/files#diff-16fadbdcb87a7ecb8febad865d8ca383f4c913f4bf272307bdd93b7a0a3ab26cR229


## AssignmentCert

An `AssignmentCert`, short for Assignment Certificate, is a piece of data provided by a validator to prove that they have been selected to perform secondary approval checks on an included candidate.
Copy link
Contributor

@burdges burdges Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually okay with the name certificate here.. It's initially confusing since a certificate is normally a signature by one key on another key, but in this case it's a VRF by a node's own key that delegates them the authority to sign as an approval checker. I'll let you know if I think of anything better, but I was obviously struggling with naming this before, so let's keep the name cert for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you come up with! I couldn't think of anything good either.

kind: AssignmentCertKind,
// The VRF showing the criterion is met.
vrf: VRFInOut,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've the vrf_inout living in two places here.

I'd normally have a vrf_preout and vrf_proof together ala https://github.com/paritytech/polkadot/pull/1558/files#diff-16fadbdcb87a7ecb8febad865d8ca383f4c913f4bf272307bdd93b7a0a3ab26cR233 but a vrf_inout would live in the type either before signing or after verification ala https://github.com/paritytech/polkadot/pull/1558/files#diff-16fadbdcb87a7ecb8febad865d8ca383f4c913f4bf272307bdd93b7a0a3ab26cR161

I jumped through polymorphism hoops to unify the notion of an assignment before signing or after verification with the idea that I'd unify some data structures and make fewer mistakes. It did not go as well as I'd envisioned though, so I ended up with the expected three final types largely separate:

  • our assignment with vrf_inout computed to find delay tranche, but not yet signed because maybe never used
  • actually signed assignments, and
  • verified assignments by anyone, ala your TrancheEntry.

Copy link
Contributor Author

@rphmeier rphmeier Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you would expect to have for the unverified type, which is this AssignmentCert? I admit that I just slapped VRFInOut on there, but I probably do need a VRFProof somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unverified would contain a VRFPreOut and VRFProof because although you could compute the VRFInOut earlier it'll make you code robust if you never compute the VRFInOut until you call vrf_verify/dleq_verifiy.

An assignment of our own that is not yet announced would contain a VRFInOut for which we did not yet bother to compute the VRFProof. I'd originally meant to unify this into our own unannounced being handled in parallel with validated ones from others, but that did not go perfectly.

I'm unsure how you want to buffer the unsent gossip messages. I ended up holding them in the same data structures as things containing both a VRFProof and a VRFInOut, from which one can easily compute the VRFPreOut required in gossip. That kinda centralized all the timing/tranche accounting, but I suspect you guys have separate modules for such things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd envisioned using the VRF as the gossip message signature, which saves doing another signature, but complicates the layers since we'd verify the VRF even to know if the message makes sense. We could avoid this verification triggering any chain historical lookups by incluidng the VRFPreOut along with the message, and then validating that later. It's too much code subtelty to do this maybe..

validator: ValidatorIndex,
signature: ApprovalSignature,
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the signature is always over ApprovalVote, but we handle it with block_hash and candidate_index sometimes. Would we gossip in the SignedApprovalVote form? We could use an abbreviation of IndirectSignedApprovalVote when placing on-chain perhaps, although maybe not initially?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect we gossip in the Indirect form, because it allows us to gauge politeness more easily. That's the main purpose of this struct, actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a good way of knowing if a peer has a candidate, except for knowing if they have the block that caused inclusion of the candidate.

ordian added a commit that referenced this pull request Nov 2, 2020
* master:
  Adder collator improvements (#1896)
  Fixes bug that collator wasn't sending `Declare` message (#1895)
  fix service build: enable notifications protocol only under real overseer (#1894)
  Adds test parachain adder collator  (#1864)
  A real overseer feature (#1892)
  Implementer's guide: Approval Voting Subsystem (#1691)
  Companion for #6912 (#1784)
// 16-bit unsigned integer with 8 bits before and after the point.
ratio: FixedU16,
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. ref #1656

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct isn't yet used, but we'll get around to it at some point.

/// Should not be sent unless the block hash is known.
CheckAndImportAssignment(
Hash,
AssignmentCert,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've already checked the assignment cert here? I'd expect the opposite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't checked it at this point, no.

///
/// Should not be sent unless the block hash within the indirect vote is known.
CheckAndImportApproval(
IndirectSignedApprovalVote,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we gossiping this form or the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably gossiping this form, but that might change as I work on the network protocol.


#### `OverseerSignal::BlockFinalized`

On receiving an `OverseerSignal::BlockFinalized(h)`, we fetch the block number `b` of that block from the ChainApi subsystem. We update our `StoredBlockRange` to begin at `b+1`. Additionally, we remove all block entries and candidates referenced by them up to and including `b`. Lastly, we prune out all descendents of `h` transitively: when we remove a `BlockEntry` with number `b` that is not equal to `h`, we recursively delete all the `BlockEntry`s referenced as children. We remove the `block_assignments` entry for the block hash and if `block_assignments` is now empty, remove the `CandidateEntry`.
Copy link
Contributor

@burdges burdges Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to ensure any slashable or payable votes we want remembered makes it on-chain by this point, but I think that's done here somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't done the on-chain reporting stuff yet. Disputes will track a DB of all votes we've seen by validators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree disputes provides a better venue for tracking this, thanks.

* Count no-shows in tranches `k..l` and for each of those, take tranches until all no-shows are covered. Repeat so on until either
* We run out of tranches to take, having not received any assignments past a certain point. In this case we set `n_tranches` to a special value `ALL` which indicates that new assignments are needed.
* All no-shows are covered. Set `n_tranches` to the number of tranches taken
* return `n_tranches`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've written the algorithm as I defined it, but we should probably change this to add one tranche per no-show.


#### `import_checked_assignment`
* Load the candidate in question and access the `approval_entry` for the block hash the cert references.
* Ensure the validator index is not part of the backing group for the candidate.
Copy link
Contributor

@burdges burdges Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, we have backers vote with their original backing announcement message since we do not distinguish the first backer.


#### `check_approval(block_entry, approval_entry, n_tranches) -> bool`
* If `n_tranches` is ALL, return false
* Otherwise, if all validators in `n_tranches` have approved, return `true`. If any validator in these tranches has not yet approved but is not yet considered a no-show, return `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed tranches_to_approve and check_approval to be one method because check_approval knows about no-shows.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. S0-design PR/Issue is in the design stage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guide: Approval Voting Subsystem
5 participants