You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have three nodes in dLedger cluster: n0, n1, n2. n0 is preferedLeader
Firstly, n0 is leader. But there is a problem with the machine where n0 is located. Therefore, n2 is elected as the new leader.
When n0 recovers, n2 will transfer the leader to n0.
But n0 did not respond to n2's transfer request in time.
2022-10-26 08:21:20 INFO NettyServerPublicExecutor_3 - [n0] [ChangeRoleToCandidate] from term: 56 and currTerm: 55
2022-10-26 08:22:15 INFO StateMaintainer - n0_[INCREASE_TERM] from 55 to 56
n0 received the transfer request at 08:21:20, but the election was initiated at 08:22:15, causing the transfer request to fail and n2 to become writable. However, at this time, n0 is candidate, and the data cannot be synchronized. As a result, the lagging position of n0 is greater than 1000, and n2 no longer initiates a transfer request.
Because n0 is candidate, the data cannot be synchronized.
Solution
We have two ways
n0 actively rolls back to follower and rolls back term
Term only increases but does not decrease, not in line with the paper
latest term server has seen (initialized to 0 on first boot, increases monotonically)
The paper mentions that when a candidate receives an append request from the leader, if currentTerm <= leader's term, it should become a follower.
While waiting for votes, a candidate may receive an AppendEntries RPC from another server claiming to be leader. If the leader’s term (included in its RPC) is at least as large as the candidate’s current term, then the candidate recognizes the leader as legitimate and returns to follower state. If the term in the RPC is smaller than the candidate’s current term, then the candidate rejects the RPC and continues in candidate state
The leader node increases the term and becomes a candidate to initiate an election. n0 participates in the voting process normally and returns to normal.
Reference 5.1 of the paper mentions:
if one server’s current term is smaller than the other’s, then it updates its current term to the larger value.
Therefore, we can fix it according to Method 2.
The text was updated successfully, but these errors were encountered:
Question
We have three nodes in dLedger cluster: n0, n1, n2. n0 is preferedLeader
n0 received the transfer request at 08:21:20, but the election was initiated at 08:22:15, causing the transfer request to fail and n2 to become writable. However, at this time, n0 is candidate, and the data cannot be synchronized. As a result, the lagging position of n0 is greater than 1000, and n2 no longer initiates a transfer request.
Because n0 is candidate, the data cannot be synchronized.
Solution
We have two ways
Term only increases but does not decrease, not in line with the paper
The paper mentions that when a candidate receives an append request from the leader, if currentTerm <= leader's term, it should become a follower.
Reference 5.1 of the paper mentions:
Therefore, we can fix it according to Method 2.
The text was updated successfully, but these errors were encountered: