Always maintain etcd quorum, ensure wait signals are sufficient #411

cknowles · 2017-03-13T11:49:50Z

Extracted from #332.

Background

@c-knowles did an uptime rolling update of etcd using v0.9.5-rc.3. It seems there was a slight pause on etcd responding in a 3 node cluster when the state was:

first new node was up, old node terminated
second new node was running but possibly not quite fully linked into cluster yet, old node terminated
third new node was not up, old node still running

The pause was circa 20 seconds and various processes including kubectl and dashboard became unresponsive momentarily. I just wanted to check if anyone has seen anything similar before trying to diagnose more? Each of the wait signals was passing after around 5 minutes so it looks like this was etcd related somehow.

Details

From @mumoshu - each etcd2 member(=etcd2 process inside a rkt pod) doesn't wait until the member becomes connected and ready to serve requests on its startup, and there's no way to know the member is actually ready.

For example, running etcdctl --peers <first etcd member's advertised peer url> cluster-health would block until all the remaining etcd members until the number meets quorum(2 for your cluster). This incomplete solution hits a chicken-and-egg problem like this and break wait signals. That's why it doesn't wait for an etcd2 member to be ready to avoid down time completely.

For @mumoshu, the down time was less than 1 sec when first tried but it is suspected the result varies from time to time hence @c-knowles' case.

Implementation

@mumoshu mentioned etcd3 seems to signal systemd for readiness when its systemd unit is set to Type=notify. So this may be covered by #381.

@redbaron mentioned an idea about drawing dependencies between ASGs, then CF will roll them one by one. It should allow quorum to be maintained all the time.

The text was updated successfully, but these errors were encountered:

mumoshu · 2017-04-30T14:08:20Z

@c-knowles Thanks for bringing this up 👍
Etcd ASGs have dependencies among them to allow CF to replace one by one.
Also, etcd3 is the default etcd version since kube-aws v0.9.6-rc.1. I've tried my best for etcd3 systemd services to notify systemd for readiness(Type=notify) whenever possible.
Could you confirm if this issue is fixed?

cknowles · 2017-05-17T14:21:12Z

@mumoshu sorry for my late reply. I've upgraded our clusters to the latest kube-aws so awaiting an opportunity to retest this. If I am on one of the etcd nodes, do you have a recommendation on how to access etcd via etcdctl? The version on the instance seems to be the older v2 which confused me for a while. v3 is running inside rkt and also had some trouble trying to get into the container (I'm new to rkt, I think maybe the container does not have a shell installed).

fejta-bot · 2019-04-21T07:27:36Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-21T08:10:35Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-20T09:00:46Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-20T09:00:56Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close