Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving 2f+1 Prevotes before Proposal might affect termination #1745

Closed
milosevic opened this issue Jun 14, 2018 · 13 comments
Closed

Receiving 2f+1 Prevotes before Proposal might affect termination #1745

milosevic opened this issue Jun 14, 2018 · 13 comments
Assignees
Labels
C:consensus Component: Consensus T:bug Type Bug (Confirmed)
Milestone

Comments

@milosevic
Copy link
Contributor

Assume that we are in initial round with a correct proposer (or with a proposer who has locked value in the highest round). Furthermore, assume that we are in GST period and that timeouts are large enough so communication among correct processes is reliable and timely. If a correct process p receives 2f+1 prevotes (any set of prevotes) before receiving Proposal he will prevote for nil. If faulty processes also prevote for nil (or stay silent), no correct process will precommit v, and we will not terminate in this round. This is not good as this is scenario in which we should always ensure termination.

Similar reasoning applies also for Precommit messages. If a correct process receives 2f+1 Precommit messages, before Polka and Proposal, he will Precommit nil, and again, we will not be able to terminate.

@milosevic milosevic added the C:consensus Component: Consensus label Jun 14, 2018
@ebuchman
Copy link
Contributor

If a correct process p receives 2f+1 prevotes (any set of prevotes) before receiving Proposal he will prevote for nil. If faulty processes also prevote for nil (or stay silent), no correct process will precommit v,

How does this follow ? If 2f+1 prevote for B, then surely some correct process might see that and precommit for B.

@milosevic
Copy link
Contributor Author

There are no 2f+1 prevotes for B as faulty processes prevote nil.

@ebuchman
Copy link
Contributor

Right. From discussion, what we need is the following:

  • on receive +2/3-any prevotes while in propose step, do nothing. this is so we wait out the rest of timeout propose and then we can prevote. if there's still only +2/3-any, we start timeout prevote then.

Note we want a similar mechanism on +2/3-any while in propose step, but it's more involved - we'd have to detect it, when we go to prevote, that actually we already received +2/3-any and can go to precommit.

@jaekwon
Copy link
Contributor

jaekwon commented Jun 25, 2018

I don't follow.

If a correct process p receives 2f+1 prevotes (any set of prevotes) before receiving Proposal he will prevote for nil. If faulty processes also prevote for nil (or stay silent), no correct process will precommit v, and we will not terminate in this round.

If at most f are byzantine, with 3f+1 total voting power, even with all faulty processes prevoting nil or staying silent, 2f+1 will receive the proposal assuming timeouts are sufficiently long.

@ebuchman
Copy link
Contributor

ebuchman commented Jun 25, 2018

2f+1 will receive the proposal assuming timeouts are sufficiently long

Maybe in a later round, but the idea here is to try and terminate as soon as we have some synchronous communication available, rather than wasting time with more rounds.

@milosevic is that right ? Or is this something that you think could be abused by an adversary to delay termination for ever?

@milosevic
Copy link
Contributor Author

milosevic commented Jun 25, 2018

2f+1 correct might not Prevote for B even if timeouts are sufficiently long in the following case:
f+1 correct receives Proposal and Prevote B, f faulty prevote Nil, the other f correct processes receives 2f+1 Prevote messages before receiving Proposal, so they Prevote Nil (in the current implementation upon receiving any set of 2f+1 Prevote messages, we enter Prevote step, and as we haven't received Proposal yet, we prevote Nil). They will receive Proposal later, but as they already Prevote Nil, the correct processes will not decide in this round. This can repeat forever, so faulty processes can prevent termination forever if they manage to reach some correct process before it receives Proposal. We have addressed this issue in the latest version of spec by preventing processes to process Prevote messages before receiving Proposal.

@jaekwon
Copy link
Contributor

jaekwon commented Jun 26, 2018

Yeah, got it.

So, to deal with byzantine-prevotes from progressing prevotes prematurely, we need to wait for the proposal or block (e.g. no forwarding).

Currently, for byzantine-precommits, we wait the precommit_wait duration for stragglers in the case of +2/3 any precommits. But the same attack works in the precommit step, so we should to delay the precommit vote until after prevote_wait.

e.g.

  • see any +2/3 prevotes (but not a polka) -> goto propose (if not already) and set prevote_wait timer.
  • see any +2/3 precommits (but not a commit) -> goto prevote (if not already), set prevote_wait timer, AND precommit_wait timer sequentially (close to each other). The prevote_wait timeout will expire first, and get us to precommit. We will then either see a commit, or, precommit_wait will expire immediate after, getting us to the next round.

@milosevic
Copy link
Contributor Author

Yes the same attack applies for precommits. But we can deal witt it in a slightly different way for precommits. In the latest version of the spec, we trigger precommit_wait timer unconditionally upon receiving any 2f+1 precommits. But we don't precomit nil in this case. If Precommit_wait timer expires and we are still in that round, we start next round. If in the meantine we commit we will start next height. It is very important to trigger precommit_wait timer and whole this logic unconditionally upon receiving any set of 2f+1 precommits due to round skipping. This logic let us quickly move to a higher round if a process lag behind, where we pay only precommit_wait timeout in some cases. Otherwise, round synchronisation could be very, very slow (if for example we would have step condition for precommit_wait like we have for prevotes).
This is a bit harder to explain here, but it is supported by the proofs in the spec.

It turns out that triggering prevote_wait only after process prevoted (received proposal or proposal_timeout expires), and precommit_wait unconditionally (upon receiving 2f+1 precommits) can guarantee termination (and fast round skipping) with proper timeout values (timeoutPropose must be greated than timeoutPrecommit). There is some math behind this claim in the spec.

@jaekwon
Copy link
Contributor

jaekwon commented Jun 27, 2018

But we don't precomit nil in this case

I think we can also precommit nil right before we go to the next round. Code snippet below.

I think we want the nil precommit because the validator is still signaling that they're online. Even if our protocol says not to precommit nil, validators will modify the logic because it makes it less likely for them to get unbonded due to network issues.

@jaekwon
Copy link
Contributor

jaekwon commented Jun 27, 2018

Here's a suggested consensus state machine change...

When we have +2/3 prevote any (but not a polka): (enterNewRound is implicit here)

tendermint/consensus/state.go

Lines 1592 to 1593 in f5d0b88

cs.enterPropose(height, vote.Round) // we can't prevote until we wait for the proposal.
cs.enterPrevoteWait(height, vote.Round)

When we have +2/3 precommit any (but not a commit):

tendermint/consensus/state.go

Lines 1625 to 1628 in b51ed13

cs.enterNewRound(height, vote.Round)
cs.enterPropose(height, vote.Round)
cs.enterPrevoteWait(height, vote.Round)
cs.enterPrecommitWait(height, vote.Round)

It's still not safe to prevote in the on-2/3-any-precommit snippet above, but we enterPropose in case so that we do prevote. If we're locked on a block we should probably prevote that as well.

The snippet sets prevote_wait and precommit_wait to expire sequentially at the same time, because we would precommit nil after prevote_wait is done, enter precommit, then enter finalizeCommit due to the precommit_wait, and immediately go to the next height.

@ebuchman ebuchman added T:bug Type Bug (Confirmed) prelaunch labels Jun 28, 2018
@ebuchman ebuchman added this to the launch milestone Jun 28, 2018
@xla xla removed the prelaunch label Jul 18, 2018
@ticktackim
Copy link

@jaekwon @ebuchman @milosevic Hey guys, just chew the Tendermint thesis launched on Cornell site. It's pretty awesome for your work to achieve scalability based on PBFT and SBFT. But I have a question about the security. Since you mentioned in the paper the system parameters, the value of GST & timeout Delta are assumed to be unknown for the safety of algorithm. Do you think there will be an extreme circumstance that these parameter values are being acknowledged by faulty processes, and these Byzantine nodes will just remain silence and wait for a threshold number 2f, then proceed prevote/precommit in conspiracy in the next (r+1) round to take over the system?

@jaekwon
Copy link
Contributor

jaekwon commented Sep 1, 2018

@ticktackim GST/Delta are assumed unknown but some finite fixed number for all non-byzantine actors. So as long as < 1/3 Byzantine, the Byzantine actors can do whatever they want including remaining silent for some amount of time greater than Delta. If there exist >= 1/3 Byzantine, then liveness isn't guaranteed. One of the attack scenarios would involve >= 1/3 Byzantine actors delaying their votes. But they wouldn't be able to compromise safety without ultimately becoming punisheable, assuming that we require sharing justification (see the tendermint wiki BFT algo for "Proof of Accountability").

@ebuchman
Copy link
Contributor

Addressed in #2540

See some minor follow up in #2625

iKapitonau pushed a commit to scrtlabs/tendermint that referenced this issue Jul 10, 2024
…int#1730) (tendermint#1745)

* fix: increase abci socket message size limit to 2GB (tendermint#1730)

* fix: increase abci socket message size to 2GB

* fix: added .changelog

* Update .changelog/unreleased/improvements/1730-increase-abci-socket-message-size-limit

Co-authored-by: Anton Kaliaev <[email protected]>

* fix: use MaxInt32 as message size for 32-bit systems

* Update .changelog/unreleased/improvements/1730-increase-abci-socket-message-size-limit

---------

Co-authored-by: Anton Kaliaev <[email protected]>
Co-authored-by: Thane Thomson <[email protected]>
(cherry picked from commit 092b918)

* Rename 1730-increase-abci-socket-message-size-limit to 1730-increase-abci-socket-message-size-limit.md

---------

Co-authored-by: Troy Kessler <[email protected]>
Co-authored-by: Thane Thomson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:consensus Component: Consensus T:bug Type Bug (Confirmed)
Projects
None yet
Development

No branches or pull requests

5 participants