KCP rollout with new config will stuck when there is an unhealthy APIServer node #10093
Labels
kind/bug
Categorizes issue or PR as related to a bug.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
What steps did you take and what happened?
What did you expect to happen?
In step 4, KCP controller should delete the Machine with unhealthy APIServer, then the KCP rollout can succeed.
Cluster API version
1.5.2
Kubernetes version
1.25.15
Anything else you would like to add?
The root cause is either reconcileUnhealthyMachines() for MHC or upgradeControlPlane(ctx, controlPlane, machinesNeedingRollout) / selectMachineForScaleDown() do not consider the Machine CR with false APIServerPodHealthy condition.
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: