-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preferred Leader leads to node unavailability problem #85
Comments
Optimization ideas |
@vongosling Can you help me review this revision? |
Optimization ideas
|
@RongtongJin this change may not a good choice, can you help to review it. This problem will cause the unavailable of node of broker group. |
I think ,if preferred leader is too much behind。Should be downgraded follower |
After downgrading to follower, the voting term is still larger than other nodes. Therefore, the term will be rolled back first and then the node will be downgraded to follower. |
I did the test before,Seems to be in line with expectations. this is my log。 leader = > FOLLOWER 2021-10-20 10:42:09 INFO DLedgerServer-ScheduledExecutor - transferee fall behind index : 274935 FOLLOWER => leader 2021-10-20 10:41:51 INFO QuorumAckChecker-n0 - [n0][FOLLOWER] term=4 ledgerBegin=0 ledgerEnd=26559198 committed=26559198 watermarks={} |
when preferredLeader too much behind , |
The following is an example of triggering an exception:
First:
Second:
In the above scenario, n0 is always in the Candidate state, and voting is continuously initiated, resulting in node n0 being unavailable. 如下为一个触发异常的例子:
第一步:
第二步:
在上述场景中,n0 始终为 Candidate状态,并不断发起投票,导致 n0 节点不可用。 |
thanks,I will test it |
@RongtongJin Please help to review it, and I am very much looking forward to other better optimization methods that may be communicated together. |
Signed-off-by: zhangyang21 <[email protected]>
Signed-off-by: zhangyang21 <[email protected]>
Signed-off-by: zhangyang21 <[email protected]>
Signed-off-by: zhangyang21 <[email protected]>
Signed-off-by: zhangyang21 <[email protected]>
Merged |
The initial state of the broker group
n0 is master, preferredLeader
n1 and n2 is slave
The status after the re-election is as follows
n0 and n2 is slave
n1 is master
Problem Description
n0 node cannot synchronize data.
If the cluster restarts, the brokerId is always -1.
Problem Analysis
When the preferredLeader is triggered to re-initiate the election, if the ldeger entry index lags behind the leader node, the vote fails.
As the term increases by 1, the node is always larger than the other node's term and cannot be re-elected as a follower. Therefore, the preferredLeader will always be in the candidate state, resulting in the node being unavailable.
The log of the n0 node is as follows
The log of the n1 node is as follows
The text was updated successfully, but these errors were encountered: