Implement Ready status condition #24

sircthulhu · 2024-03-15T12:34:31Z

In general

We need to implement checking quorum status of an initialized cluster and update Ready status condition in accordance. After initializing cluster and making sure pods found each other and formed a cluster, controller must update cluster state configmap to set cluster state existing (from new)

Design proposal

When cluster is initialized, we should check if:

StatefulSet is Ready
Connect to any etcd and check if quorum is fine

if everything is fine, set status condition Ready and update value in cluster state configmap to existing.
If some pods in statefulset aren't ready but quorum is fine, set Ready, but mark cluster as degraded in Reason field.
If pods in sts aren't ready and quorum is lost, mark cluster as not Ready

If configmap already has existing state, do not change it anymore, as cluster should be already initialized

The text was updated successfully, but these errors were encountered:

aobort · 2024-03-15T12:47:55Z

Any option like "pods & sts are ready but quorum is lost"? Shouldn't we check the health of the cluster throughout its entire lifecycle? I mean, that etcd cluster controller will react the changes of sts state and act accordingly since controller reference is set, but in case loss of quorum or anything else happen to etcd app which will not affect pod/sts state operator will not be informed. Thinking out loud.

sircthulhu · 2024-03-15T13:05:07Z

@aobort
Thanks for your opinion, I really appreciate it :)

If I got it right, the case you mentioned is the first if. We should check that sts is ready AND check if cluster has quorum at all and there're desired number of members. If not, update cluster state accordingly.
In future, the controller will perform actions if possible

Kirill-Garbar · 2024-03-15T13:46:09Z

@sircthulhu I didn't get from the description and from the answer should this status be continiously updated or just once? I see the etcd-operator (controller) as a "watchdog" who updates the status by some events/regularly. I will try to dive deeper into the docs and code to get such answers but currently it is not clear should this logic be implemented while working on this issue or not.

aobort · 2024-03-15T14:21:50Z

@Kirill-Garbar if I got things right, the goal at the moment is to implement such a check on etcd cluster creation:

create sts and related stuff
when sts becomes ready, check etcd state

Therefore, from my perspective, the workflow should be like the following:

check conditions, if conditions list len eq 0 -> put all possible conditions with state eq "False" to the status
check sts and related stuff, if nothing found -> create, set "Initialized" condition to "True"
update status and exit

Next reconciliation run schould be triggered by the update of sts state - NotReady -> Ready:

check conditions (already filled)
check sts and related stuff (already created) -> check sts state
check etcd state
update condition "Ready" to True

Continuous state tracking is definitely should be implemented. But, from my perspective it should be implemented separately from the controller's code. May be as another service which implements Runnable interface, so it could be added to manager with mgr.Add() call. It could just check etcd cluster's health and in case something happen, update sts annotation which should trigger the reconciliation of etcd cluster object.

sircthulhu · 2024-03-15T14:33:35Z

@aobort
You're definetly right about the workflow.

I'd like to make it more precise

Next reconciliation run schould be triggered by the update of sts state - NotReady -> Ready:

check conditions (already filled)

We do not rely on what's written in status to get the current state, Reconcile function should be idempotent

I agree with the point that continuous tracking should be implemented as Runnable interface, as it better for architecture. For cluster bootstrap it is just enough to get update when StatefulSet is changed somehow

aobort · 2024-03-15T14:38:10Z

@sircthulhu

We do not rely on what's written in status to get the current state, Reconcile function should be idempotent

Absolutely. I meant, that the update of sts object (it does not matter whether it will be regular status update or manual or whatnot) will trigger etcd cluster object reconciliation. 'Cuz there is the controller reference applied to sts.

kvaps · 2024-03-15T16:34:02Z

If some pods in statefulset aren't ready but quorum is fine, set Ready, but mark cluster as degraded in Reason field.

This requires additional condition, eg: Quorum: True|False with reason ClusterHasQuorum

kvaps · 2024-03-15T17:56:57Z

Internaly we agreed to implement just single condition for now - Ready: True|False

EtcdCluster should be considered Ready when StatefulSet is Ready
EtcdCluster should be considered NotReady when StatefulSet is NotReady

Every pod in StatefulSet should have healthchecks defined:

We can borrow the logic from Kubeadm, for example:

https://github.com/kubernetes/kubernetes/blob/89b1db79d75e367f567ea06c0528ef4b8f3fecb4/cmd/kubeadm/app/phases/etcd/local.go#L227-L228

Here is latest issue where they were updated:
kubernetes/kubeadm#2567

Hovewer @Uburro noted that:

Since v3.5.12, two new endpoints /livez and /readyz are added.

the /livez endpoint reflects whether the process is alive or if it needs a restart.
the /readyz endpoint reflects whether the process is ready to serve traffic.

gaalw · 2024-03-15T18:56:07Z

Any option like "pods & sts are ready but quorum is lost"? Shouldn't we check the health of the cluster throughout its entire lifecycle?

operator is not monitoring solution. Also if you would implement it, the data in CR will be always stale.

aobort · 2024-03-15T19:25:13Z

@gaalw

operator is not monitoring solution

Agree. However according to plans to implement maintenance of the cluster monitoring of its state becomes mandatory capability.

Also if you would implement it, the data in CR will be always stale.

It depends. In case cluster health was not changed, there would be no reason for watchdog to update CR.

Uburro · 2024-03-15T20:54:16Z

According to this KEP, the ETCD introduced full-fledged checks of readiness and liveness probes. /readyz for us mention that no raft leader, raft loop deadlock, data coraption. In this case, it will disappear from the endpoints and for our controller this will be a signal that everything is ok or not ok with the cluster. We also subscribe to events from the shared informer so there is no need for the controller to go through the clusters and poll their status. This will be done by Kubernetes native way itself using kubelet. /livez will be a signal about a problem with the database (apps) itself and a trigger for its restart. This way we avoid unnecessary restarting of the etcd. This is useful in cases where the etcd will be used outside the kubernetes.
in this case we have to start to support since 3.5.12 and 3.4.29 vertion of etcd

Uburro · 2024-03-17T11:01:36Z

etcd-io/etcd#16666 we can think about --experimental-wait-cluster-ready-timeout with starupprobe in a feature

Update cluster Ready status according to StatefulSet status and update ConfigMap cluster state to `existing` after first time STS is ready fixes #24

kvaps added this to the v0.0.1 milestone Mar 15, 2024

sircthulhu linked a pull request Mar 15, 2024 that will close this issue

Implementation of Ready status condition #26

Merged

sircthulhu mentioned this issue Mar 15, 2024

Implementation of Ready status condition #26

Merged

sircthulhu closed this as completed in #26 Mar 18, 2024

sircthulhu added a commit that referenced this issue Mar 18, 2024

Implementation of Ready status condition (#26)

9e35e26

Update cluster Ready status according to StatefulSet status and update ConfigMap cluster state to `existing` after first time STS is ready fixes #24

kvaps assigned sircthulhu Mar 20, 2024

kvaps unassigned sircthulhu Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Ready status condition #24

Implement Ready status condition #24

sircthulhu commented Mar 15, 2024

aobort commented Mar 15, 2024

sircthulhu commented Mar 15, 2024

Kirill-Garbar commented Mar 15, 2024

aobort commented Mar 15, 2024

sircthulhu commented Mar 15, 2024

aobort commented Mar 15, 2024 •

edited

Loading

kvaps commented Mar 15, 2024 •

edited

Loading

kvaps commented Mar 15, 2024 •

edited

Loading

gaalw commented Mar 15, 2024

aobort commented Mar 15, 2024 •

edited

Loading

Uburro commented Mar 15, 2024 •

edited

Loading

Uburro commented Mar 17, 2024

Implement Ready status condition #24

Implement Ready status condition #24

Comments

sircthulhu commented Mar 15, 2024

In general

Design proposal

aobort commented Mar 15, 2024

sircthulhu commented Mar 15, 2024

Kirill-Garbar commented Mar 15, 2024

aobort commented Mar 15, 2024

sircthulhu commented Mar 15, 2024

aobort commented Mar 15, 2024 • edited Loading

kvaps commented Mar 15, 2024 • edited Loading

kvaps commented Mar 15, 2024 • edited Loading

gaalw commented Mar 15, 2024

aobort commented Mar 15, 2024 • edited Loading

Uburro commented Mar 15, 2024 • edited Loading

Uburro commented Mar 17, 2024

aobort commented Mar 15, 2024 •

edited

Loading

kvaps commented Mar 15, 2024 •

edited

Loading

kvaps commented Mar 15, 2024 •

edited

Loading

aobort commented Mar 15, 2024 •

edited

Loading

Uburro commented Mar 15, 2024 •

edited

Loading