-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rafttest: TestRestart is flaky #181
Comments
This statement needs checking. After a quick scan of the code, I see that it might be false. The |
But the problem is in these lines:
Node 3 is the leader, but it was stopped. However, the test code does not intend to stop the leader, which means we might have detected the leader incorrectly. |
Hi @pav-kv , based on the log you provided, I think I may have figured out why it will stop a leader.
Another thing is, it is also possible that |
Hi @pav-kv , I would like to submit a pull request to address this issue. My proposed solution involves using -1 as the initial value of 'lindex' to prevent confusion between 'not found' and 'first element'. The code would look something like this. Would this solution be acceptable to you?
|
Hi @pav-kv , Whenever you get a chance, would you mind taking a look at this proposal? No pressure at all, I'd just really appreciate your input and ideas. Thanks! |
Fixes cockroachdb#127413. This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test flake fix in etcd-io/raft#188. There was some discussion in etcd-io/raft#181 about alternatives for fixing this test. For now, we stick with a direct cherry-pick. Release note: None
Fixes cockroachdb#127413. This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test flake fix in etcd-io/raft#188. There was some discussion in etcd-io/raft#181 about alternatives for fixing this test. For now, we stick with a direct cherry-pick. Release note: None
130084: raft: fix flaky leader index in waitLeader function r=pav-kv a=nvanbenschoten Fixes #127413. This commit bypasses the larger rebase in #122133 to pick up the test flake fix in etcd-io/raft#188. There was some discussion in etcd-io/raft#181 about alternatives for fixing this test. For now, we stick with a direct cherry-pick. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes cockroachdb#127413. This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test flake fix in etcd-io/raft#188. There was some discussion in etcd-io/raft#181 about alternatives for fixing this test. For now, we stick with a direct cherry-pick. Release note: None
Fixes cockroachdb#127413. This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test flake fix in etcd-io/raft#188. There was some discussion in etcd-io/raft#181 about alternatives for fixing this test. For now, we stick with a direct cherry-pick. Release note: None
TestRestart
failed (when run many times) at commit ed26e90 with the following log:Looking at the test implementation, I suspect the problem is in the waitLeader function. It waits for a wrong signal, and erroneously reports some non-leader node to be the leader. The
n.Status().SoftState.Lead
that it collects from all the nodes only represents votes, but the actual elected leader can end up different.A fix would be to wait for a node that is in
StateLeader
.The text was updated successfully, but these errors were encountered: