🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9032

chrischdi · 2023-07-21T12:03:49Z

What this PR does / why we need it:

When the cluster is in the state that:

The KCP in general is ready and
All its conditions exist and are true, except the ControlPlaneComponentsHealthyCondition and
KCP reconciles and has a no-op

Then the controller does reach the end of the reconcile function and does a return ctrl.Result{}, nil.

At the time of the relevant workload pods (etcd, kube-apiserver, kube-controller-manager, kube-scheduler) getting ready and reporting their ready state inside the workload cluster, no new additional event gets injected for the KCP object.

The KCP controller has to wait for an different change to the watched objects, or to reach the resync period to mark the condition to true.

This delays provisioning when the preflight checks for MachineSets are active, which also leads to flaky tests due to reaching the timeout of the test before reaching the resync period.

This PR solves this delay by ensuring to requeue in this special case.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #8786

controlplane/kubeadm/internal/controllers/controller.go

killianmuldoon · 2023-07-21T12:12:17Z

controlplane/kubeadm/internal/controllers/controller.go

+			// Make sure KCP gets requeued if ControlPlaneComponentsHealthyCondition is still false.
+			// Otherwise KCP would only get requeued when KCP or the Cluster gets a change or via reaching the resyncperiod.
+			// That would lead to a delay in provisioning MachineDeployments when preflight checks are enabled.
+			// The alternative solution to this requeue would be watching the relevant pods inside each workload
+			// cluster which would be very expensive.


Suggested change

// Make sure KCP gets requeued if ControlPlaneComponentsHealthyCondition is still false.

// Otherwise KCP would only get requeued when KCP or the Cluster gets a change or via reaching the resyncperiod.

// That would lead to a delay in provisioning MachineDeployments when preflight checks are enabled.

// The alternative solution to this requeue would be watching the relevant pods inside each workload

// cluster which would be very expensive.

// Make KCP requeue if ControlPlaneComponentsHealthyCondition is false so we can check for control plane component status without waiting for a full resync (by default 10 minutes). Only requeue if there is no error, Requeue or RequeueAfter and the object does not have a deletion timestamp.

Otherwise this condition can lead to a delay in provisioning MachineDeployments when MachineSet preflight checks are enabled.

// The alternative solution to this requeue would be watching the relevant pods inside each workload cluster which would be very expensive.

controlplane/kubeadm/internal/controllers/controller.go

sbueringer · 2023-07-21T12:34:05Z

Looks good to me +/- the nits above. I would say let's get those fixed and then merge before the weekend so we get some CI coverage.

I would also propose to cherry-pick onto release-1.5.

I think overall the change is safe because we just requeue a bit more while control plane components are unhealthy

…et true

chrischdi · 2023-07-21T12:45:57Z

Updated the comments + moved the IsZero up. I hope I did catch all suggestions of it 👍

chrischdi · 2023-07-21T12:47:39Z

/area provider/control-plane-kubeadm

killianmuldoon

Thanks!

/lgtm

k8s-ci-robot · 2023-07-21T12:47:47Z

LGTM label has been added.

Git tree hash: 16b2fd21a507788b34975152ae972c85889cee59

killianmuldoon · 2023-07-21T12:53:03Z

/retest

sbueringer · 2023-07-21T13:40:01Z

Thx!!

/approve

k8s-ci-robot · 2023-07-21T13:40:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sbueringer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbueringer · 2023-07-21T13:40:18Z

/cherry-pick release-1.5

k8s-infra-cherrypick-robot · 2023-07-21T13:40:19Z

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2023-07-21T13:40:21Z

/cherry-pick release-1.4

k8s-infra-cherrypick-robot · 2023-07-21T13:40:22Z

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.4 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2023-07-21T13:40:25Z

/cherry-pick release-1.3

k8s-infra-cherrypick-robot · 2023-07-21T13:40:26Z

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2023-07-21T13:40:36Z

Let's see if it's cherry-pick'able in 1.4 and 1.3 as well

k8s-infra-cherrypick-robot · 2023-07-21T13:42:47Z

@sbueringer: new pull request created: #9034

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-infra-cherrypick-robot · 2023-07-21T13:43:23Z

@sbueringer: new pull request created: #9035

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-infra-cherrypick-robot · 2023-07-21T13:44:00Z

@sbueringer: new pull request created: #9036

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 21, 2023

k8s-ci-robot requested review from jackfrancis and stmcginnis July 21, 2023 12:03

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 21, 2023

chrischdi mentioned this pull request Jul 21, 2023

Test flaking with Timed out waiting for 1 nodes to be created for MachineDeployment #8786

Closed

killianmuldoon reviewed Jul 21, 2023

View reviewed changes

controlplane/kubeadm/internal/controllers/controller.go Outdated Show resolved Hide resolved

killianmuldoon reviewed Jul 21, 2023

View reviewed changes

controlplane/kubeadm/internal/controllers/controller.go Outdated Show resolved Hide resolved

controlplane/kubeadm/internal/controllers/controller.go Outdated Show resolved Hide resolved

requeue KCP object if ControlPlaneComponentsHealthyCondition is not y…

205a1f5

…et true

chrischdi force-pushed the pr-kcp-requeue-condition-components branch from bf812d8 to 205a1f5 Compare July 21, 2023 12:45

k8s-ci-robot added the area/provider/control-plane-kubeadm Issues or PRs related to KCP label Jul 21, 2023

killianmuldoon reviewed Jul 21, 2023

View reviewed changes

k8s-ci-robot assigned killianmuldoon Jul 21, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 21, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2023

k8s-ci-robot merged commit 4dd60f5 into kubernetes-sigs:main Jul 21, 2023
10 checks passed

k8s-ci-robot added this to the v1.6 milestone Jul 21, 2023

k8s-infra-cherrypick-robot mentioned this pull request Jul 21, 2023

[release-1.3] 🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9034

Merged

k8s-infra-cherrypick-robot mentioned this pull request Jul 21, 2023

[release-1.5] 🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9035

Merged

k8s-infra-cherrypick-robot mentioned this pull request Jul 21, 2023

[release-1.4] 🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9036

Merged

chrischdi deleted the pr-kcp-requeue-condition-components branch July 21, 2023 14:09

This was referenced Apr 9, 2024

SKS-2057: Bump CAPI to v1.6.3 smartxworks/cluster-api-provider-elf#174

Merged

SKS-2057: Bump CAPI to v1.6.3 smartxworks/cluster-api-provider-elf-static-ip#39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9032

🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9032

chrischdi commented Jul 21, 2023

killianmuldoon Jul 21, 2023 •

edited

Loading

sbueringer commented Jul 21, 2023

chrischdi commented Jul 21, 2023

chrischdi commented Jul 21, 2023

killianmuldoon left a comment

k8s-ci-robot commented Jul 21, 2023

killianmuldoon commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-ci-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9032

🐛 requeue KCP object if ControlPlaneComponentsHealthyCondition is not yet true #9032

Conversation

chrischdi commented Jul 21, 2023

killianmuldoon Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

sbueringer commented Jul 21, 2023

chrischdi commented Jul 21, 2023

chrischdi commented Jul 21, 2023

killianmuldoon left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jul 21, 2023

killianmuldoon commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-ci-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

sbueringer commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

k8s-infra-cherrypick-robot commented Jul 21, 2023

killianmuldoon Jul 21, 2023 •

edited

Loading