-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: adjust election timeout on restart #9364
Conversation
5bfa52a
to
fe9bff5
Compare
Codecov Report
@@ Coverage Diff @@
## master #9364 +/- ##
==========================================
+ Coverage 72.36% 72.75% +0.39%
==========================================
Files 362 362
Lines 30795 30846 +51
==========================================
+ Hits 22285 22443 +158
+ Misses 6869 6787 -82
+ Partials 1641 1616 -25
Continue to review full report at Codecov.
|
/subscribe cc @mborsz |
d63b940
to
5405135
Compare
the approach seems fine. can someone reproduce the observed problem with/without this patch to make sure the problem is fixed by the patch? |
@@ -417,7 +407,6 @@ func startNode(cfg ServerConfig, cl *membership.RaftCluster, ids []types.ID) (id | |||
raftStatusMu.Lock() | |||
raftStatus = n.Status | |||
raftStatusMu.Unlock() | |||
advanceTicksForElection(n, c.ElectionTick) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still should advanceTicks for newly start node. is there a reason not to do so?
etcdserver/server.go
Outdated
@@ -521,9 +523,54 @@ func NewServer(cfg ServerConfig) (srv *EtcdServer, err error) { | |||
} | |||
srv.r.transport = tr | |||
|
|||
activePeers := 0 | |||
for _, m := range cl.Members() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
establishing connection can take time. probably need some delay here.
etcdserver/server.go
Outdated
plog.Infof("%s is advancing %d ticks for faster election (election tick %d)", srv.ID(), tick, cfg.ElectionTicks) | ||
advanceTicksForElection(n, tick) | ||
} else { | ||
// on restart, there is likely an active peer already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even for restart case, we should still consider the number of active member. if there is none, we still can advance ticks, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we would need wait until the local node finds its peers (cl.Members() > 0
), to do that. I will play around it to address https://github.com/coreos/etcd/pull/9364/files#r171727961.
Last four commits add more detailed logging and better estimation on active peers. Out of 3-node cluster, Case 1. All 3 nodes start fresh (bootstrapping). In this case, fast-forward election ticks with last tick left.
Case 2. Only 2 nodes are up, 1 node is down. The 1-node restarted. In this case, do not advance election ticks.
Case 3. All 3 nodes are down. Third node restarts with no active peer.
|
Why do we need to differentiate restart vs fresh start? The strategy should be simple as this.
The only change we should do in this PR is to wait for the peer to be connected or the first connection is failed before advance ticks. |
The most serious problem before is just that we failed to wait for the connection status before advancing ticks. |
@xiang90 I differentiated fresh cluster to get away with waiting, since its member list is already populated on start. But you are right, we can simplify this (since we also have discovery services on fresh cluster).
Will make server wait up to 5 second, which is
You mean advancing with adjusted ticks, right? Rejoining node to existing cluster can still have >1 active peers after 5-second wait time. If we have only one tick left, it can be still disruptive when the last tick elapse before leader heartbeat. Fast-forwarding with Will clean this up. |
Blindly waiting for 5 seconds is bad. Peer might be connected well before 5 seconds. |
Yeah, I was thinking of adding notify routine from rafthttp, so that we discover the connectivity earlier. |
then forward it until there are two ticks. Leader should send a heartbeat within one tick. Giving it one more tick should be enough. |
Sounds good. Will work on it. Thanks! |
Now fresh node (to 3-node cluster)
restart node (to 3-node cluster)
restarted single-node cluster
|
This we do not fast forward ticks for this case?
if we follow this policy, it should be fast forwarded, no? |
I left it as TODO for now. Let me see if we can also handle the single-node case as well. |
1f03eea
to
9d4440d
Compare
d02b04e
to
eab6108
Compare
Signed-off-by: Gyuho Lee <[email protected]>
Signed-off-by: Gyuho Lee <[email protected]>
Signed-off-by: Gyuho Lee <[email protected]>
Signed-off-by: Gyuho Lee <[email protected]>
Signed-off-by: Gyuho Lee <[email protected]>
Signed-off-by: Gyuho Lee <[email protected]>
We've made adjust logic more fine-grained so that it can handle the restarting 1-node cluster. It would be great if we can confirm with latest commits as well. Thanks. |
@@ -527,6 +539,62 @@ func NewServer(cfg ServerConfig) (srv *EtcdServer, err error) { | |||
} | |||
srv.r.transport = tr | |||
|
|||
// fresh start | |||
if !haveWAL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to care about restart vs fresh start?
see #9364 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just easier, so that fresh start does not need to synchronize with peer connection reports. But as you suggested, let me simplify the logic (#9364 (comment)).
|
||
srv.goAttach(func() { | ||
select { | ||
case <-cl.InitialAddNotify(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty complicated. let us just get the peer list from the existing snapshot. we do not need to ensure all the configuration in the wal file are executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason for that is reconfiguration in infrequent. and moving from one -> N nodes cluster is even more infrequent. snapshot will contain the correct information 99% of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to cover all cases where there's no snapshot (which needs to populate member lists from WAL). But, agree that this should be simplified by loading members from snapshot. Will rework on this.
Xiang has a good point. This is a bit too complicated. I will create a separate PR with simpler solution. |
This will be replaced by #9415. |
Still advance ticks on bootstrapping to fresh cluster.
But on restart, only advance 1/10 of original election ticks.
Address #9333.
Manually tested that it adjusts election ticks.
/cc @xiang90 @jpbetz