Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Ready status condition #24

Closed
sircthulhu opened this issue Mar 15, 2024 · 12 comments · Fixed by #26
Closed

Implement Ready status condition #24

sircthulhu opened this issue Mar 15, 2024 · 12 comments · Fixed by #26
Milestone

Comments

@sircthulhu
Copy link
Member

In general

We need to implement checking quorum status of an initialized cluster and update Ready status condition in accordance. After initializing cluster and making sure pods found each other and formed a cluster, controller must update cluster state configmap to set cluster state existing (from new)

Design proposal

When cluster is initialized, we should check if:

  1. StatefulSet is Ready
  2. Connect to any etcd and check if quorum is fine
  • if everything is fine, set status condition Ready and update value in cluster state configmap to existing.
  • If some pods in statefulset aren't ready but quorum is fine, set Ready, but mark cluster as degraded in Reason field.
  • If pods in sts aren't ready and quorum is lost, mark cluster as not Ready

If configmap already has existing state, do not change it anymore, as cluster should be already initialized

@aobort
Copy link
Collaborator

aobort commented Mar 15, 2024

Any option like "pods & sts are ready but quorum is lost"? Shouldn't we check the health of the cluster throughout its entire lifecycle? I mean, that etcd cluster controller will react the changes of sts state and act accordingly since controller reference is set, but in case loss of quorum or anything else happen to etcd app which will not affect pod/sts state operator will not be informed. Thinking out loud.

@sircthulhu
Copy link
Member Author

@aobort
Thanks for your opinion, I really appreciate it :)

If I got it right, the case you mentioned is the first if. We should check that sts is ready AND check if cluster has quorum at all and there're desired number of members. If not, update cluster state accordingly.
In future, the controller will perform actions if possible

@Kirill-Garbar
Copy link
Collaborator

@sircthulhu I didn't get from the description and from the answer should this status be continiously updated or just once? I see the etcd-operator (controller) as a "watchdog" who updates the status by some events/regularly. I will try to dive deeper into the docs and code to get such answers but currently it is not clear should this logic be implemented while working on this issue or not.

@aobort
Copy link
Collaborator

aobort commented Mar 15, 2024

@Kirill-Garbar if I got things right, the goal at the moment is to implement such a check on etcd cluster creation:

  1. create sts and related stuff
  2. when sts becomes ready, check etcd state

Therefore, from my perspective, the workflow should be like the following:

  1. check conditions, if conditions list len eq 0 -> put all possible conditions with state eq "False" to the status
  2. check sts and related stuff, if nothing found -> create, set "Initialized" condition to "True"
  3. update status and exit

Next reconciliation run schould be triggered by the update of sts state - NotReady -> Ready:

  1. check conditions (already filled)
  2. check sts and related stuff (already created) -> check sts state
  3. check etcd state
  4. update condition "Ready" to True

Continuous state tracking is definitely should be implemented. But, from my perspective it should be implemented separately from the controller's code. May be as another service which implements Runnable interface, so it could be added to manager with mgr.Add() call. It could just check etcd cluster's health and in case something happen, update sts annotation which should trigger the reconciliation of etcd cluster object.

@sircthulhu
Copy link
Member Author

@aobort
You're definetly right about the workflow.

I'd like to make it more precise

Next reconciliation run schould be triggered by the update of sts state - NotReady -> Ready:

  1. check conditions (already filled)

We do not rely on what's written in status to get the current state, Reconcile function should be idempotent

I agree with the point that continuous tracking should be implemented as Runnable interface, as it better for architecture. For cluster bootstrap it is just enough to get update when StatefulSet is changed somehow

@aobort
Copy link
Collaborator

aobort commented Mar 15, 2024

@sircthulhu

We do not rely on what's written in status to get the current state, Reconcile function should be idempotent

Absolutely. I meant, that the update of sts object (it does not matter whether it will be regular status update or manual or whatnot) will trigger etcd cluster object reconciliation. 'Cuz there is the controller reference applied to sts.

@kvaps
Copy link
Member

kvaps commented Mar 15, 2024

If some pods in statefulset aren't ready but quorum is fine, set Ready, but mark cluster as degraded in Reason field.

This requires additional condition, eg: Quorum: True|False with reason ClusterHasQuorum

@kvaps
Copy link
Member

kvaps commented Mar 15, 2024

Internaly we agreed to implement just single condition for now - Ready: True|False

  • EtcdCluster should be considered Ready when StatefulSet is Ready
  • EtcdCluster should be considered NotReady when StatefulSet is NotReady

Every pod in StatefulSet should have healthchecks defined:

We can borrow the logic from Kubeadm, for example:

https://github.com/kubernetes/kubernetes/blob/89b1db79d75e367f567ea06c0528ef4b8f3fecb4/cmd/kubeadm/app/phases/etcd/local.go#L227-L228

Here is latest issue where they were updated:
kubernetes/kubeadm#2567

Hovewer @Uburro noted that:

Since v3.5.12, two new endpoints /livez and /readyz are added.

the /livez endpoint reflects whether the process is alive or if it needs a restart.
the /readyz endpoint reflects whether the process is ready to serve traffic.

@gaalw
Copy link

gaalw commented Mar 15, 2024

Any option like "pods & sts are ready but quorum is lost"? Shouldn't we check the health of the cluster throughout its entire lifecycle?

operator is not monitoring solution. Also if you would implement it, the data in CR will be always stale.

@aobort
Copy link
Collaborator

aobort commented Mar 15, 2024

@gaalw

operator is not monitoring solution

Agree. However according to plans to implement maintenance of the cluster monitoring of its state becomes mandatory capability.

Also if you would implement it, the data in CR will be always stale.

It depends. In case cluster health was not changed, there would be no reason for watchdog to update CR.

@kvaps kvaps added this to the v0.0.1 milestone Mar 15, 2024
@Uburro
Copy link
Collaborator

Uburro commented Mar 15, 2024

According to this KEP, the ETCD introduced full-fledged checks of readiness and liveness probes. /readyz for us mention that no raft leader, raft loop deadlock, data coraption. In this case, it will disappear from the endpoints and for our controller this will be a signal that everything is ok or not ok with the cluster. We also subscribe to events from the shared informer so there is no need for the controller to go through the clusters and poll their status. This will be done by Kubernetes native way itself using kubelet. /livez will be a signal about a problem with the database (apps) itself and a trigger for its restart. This way we avoid unnecessary restarting of the etcd. This is useful in cases where the etcd will be used outside the kubernetes.
in this case we have to start to support since 3.5.12 and 3.4.29 vertion of etcd

@Uburro
Copy link
Collaborator

Uburro commented Mar 17, 2024

etcd-io/etcd#16666 we can think about --experimental-wait-cluster-ready-timeout with starupprobe in a feature

sircthulhu added a commit that referenced this issue Mar 18, 2024
Update cluster Ready status according to StatefulSet status and update
ConfigMap cluster state to `existing` after first time STS is ready
fixes #24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants