-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestRejectUnhealthyRemove: should reject quorum breaking remove #7609
Comments
Based on that it took 20+ seconds, reproducible with this diff diff --git a/etcdserver/server.go b/etcdserver/server.go
index 70e14924..a62ae79a 100644
--- a/etcdserver/server.go
+++ b/etcdserver/server.go
@@ -1110,7 +1110,7 @@ func (s *EtcdServer) mayRemoveMember(id types.ID) error {
// protect quorum if some members are down
m := s.cluster.Members()
- active := numConnectedSince(s.r.transport, time.Now().Add(-HealthInterval), s.ID(), m)
+ active := numConnectedSince(s.r.transport, time.Now().Add(-time.Nanosecond), s.ID(), m)
if (active - 1) < 1+((len(m)-1)/2) {
plog.Warningf("reconfigure breaks active quorum, rejecting remove member %s", id)
return ErrUnhealthy Which means, slow machine semaphore took more than Should we just |
Except Would like to avoid unconditional sleeps if possible. I was hoping number of connected peers would be available from |
Fix etcd-io#7609. Signed-off-by: Gyu-Ho Lee <[email protected]>
Think this is fixed Here we re-start members[0], https://github.com/coreos/etcd/blob/master/integration/cluster_test.go#L434
// bring cluster to (4,1)
c.Members[0].Restart(t) And I found out the following logs show the same pattern
Logs can be found at https://jenkins-etcd-public.prod.coreos.systems/job/etcd-proxy/1212/consoleFull. |
OK can reopen if it turns out that timing problem turns out to be causing it to break as well. |
Seen a few times recently on semaphore.
edits from gyuho
logs
https://jenkins-etcd-public.prod.coreos.systems/job/etcd-proxy/1212/consoleFull
The text was updated successfully, but these errors were encountered: