-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: fix the leader cannot election after pd leader lost while etcd leader intact #6447
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
…d leader intact Signed-off-by: nolouch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but ci failed
zap.String("current-leader-member-id", types.ID(etcdLeader).String()), | ||
zap.String("transferee-member-id", types.ID(s.member.ID()).String()), | ||
) | ||
s.member.MoveEtcdLeader(s.ctx, etcdLeader, s.member.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider the leader's priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, there are no priority. if there exists priority, it also not affect the priority because the higher priority will do move the leader again. so, I think we can keep simple with this implementation.
tests/server/member/member_test.go
Outdated
re.NoError(failpoint.Enable("github.com/tikv/pd/server/exitCampaignLeader", fmt.Sprintf("return(\"%d\")", memberID))) | ||
re.NoError(failpoint.Enable("github.com/tikv/pd/server/timeoutWaitPDLeader", `return(true)`)) | ||
leader2 := waitLeaderChange(re, cluster, leader1) | ||
t.Log("leader2:", leader2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Log("leader2:", leader2) |
re.NoError(failpoint.Enable("github.com/tikv/pd/server/timeoutWaitPDLeader", `return(true)`)) | ||
leader2 := waitLeaderChange(re, cluster, leader1) | ||
t.Log("leader2:", leader2) | ||
re.NotEqual(leader1, leader2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the duration that leader changing costs won't be too long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this fix, this test will be failed after timeout after 30s in waitLeaderChange
.
Signed-off-by: nolouch <[email protected]>
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6447 +/- ##
=========================================
Coverage ? 74.85%
=========================================
Files ? 410
Lines ? 41718
Branches ? 0
=========================================
Hits ? 31227
Misses ? 7782
Partials ? 2709
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
/merge |
@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 9eaa004
|
/merge |
@nolouch: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 14e9f16
|
@@ -1432,6 +1436,14 @@ func (s *Server) leaderLoop() { | |||
} | |||
|
|||
leader, checkAgain := s.member.CheckLeader() | |||
// add failpoint to test leader check go to stuck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file permission has been changed.
/hold |
Signed-off-by: nolouch <[email protected]>
/hold cancel |
/merge |
@rleungx: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 6d810b4
|
@nolouch: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
In response to a cherrypick label: new pull request created to branch |
close tikv#6403 Signed-off-by: ti-chi-bot <[email protected]>
close tikv#6403 Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
…d leader intact (#6447) (#6461) close #6403, ref #6447 server: fix the leader cannot election after pd leader lost while etcd leader intact Signed-off-by: ti-chi-bot <[email protected]> Signed-off-by: nolouch <[email protected]> Co-authored-by: ShuNing <[email protected]> Co-authored-by: nolouch <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…d leader intact (#6447) (#6460) close #6403, ref #6447 server: fix the leader cannot election after pd leader lost while etcd leader intact Signed-off-by: ti-chi-bot <[email protected]> Signed-off-by: nolouch <[email protected]> Co-authored-by: ShuNing <[email protected]> Co-authored-by: nolouch <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
What problem does this PR solve?
Issue Number: Close #6403
What is changed and how does it work?
Check List
Tests
Release note