-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Fix node deletion under managed control planes #3673
🐛 Fix node deletion under managed control planes #3673
Conversation
Welcome @dthorsen! |
Hi @dthorsen. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@vincepri As far as I know node objects for control plane are only used under a kubeadm control plane. EKS and GKE don't list control plane nodes. |
fe93b72
to
35a0ef6
Compare
8dd9af0
to
64eda6d
Compare
/ok-to-test |
bca0fa3
to
1398f8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/assign @ncdc
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@vincepri or @ncdc It occurred to me that maybe returning an error here: may also be a breaking change for non-externally-managed control plane clusters in the case of a 404. I think if this error is a 404 we should simply skip the call to |
Good catch, we should fallback to the old logic (checking machines) if the error is NotFound |
1b5e83a
to
6a9472b
Compare
/test pull-cluster-api-test |
/milestone v0.3.10 |
/assign @CecileRobertMichon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for me, only a small thing that can be also addressed in a follow-up PR
// IsExternalManagedControlPlane returns a bool indicating whether the control plane referenced | ||
// in the passed Unstructured resource is an externally managed control plane such as AKS, EKS, GKE, etc. | ||
func IsExternalManagedControlPlane(controlPlane *unstructured.Unstructured) bool { | ||
managed, found, err := unstructured.NestedBool(controlPlane.Object, "status", "externalManagedControlPlane") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document somewhere that manage control plane implementation should have this field (might be in https://cluster-api.sigs.k8s.io/developer/architecture/controllers/control-plane.html#required-status-fields)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this isn't part of the contract currently AFAIK
cc @alexeldeib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 updating the documentation, just a note though the field is completely optional and there is a fallback to the current behavior during delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. What are thoughts on making this new field be an optional part of the contract to avoid a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unusual case in that the field is really required for managed control planes like AKS, EKS, GKE, etc otherwise the Machine controller will suffer this pretty bad bug (instance termination without draining nodes), but not required at all for KubeadmControlPlane. No other optional status fields currently appear to modify functionality and are purely informational. If we make it a new required status field though, I think this would need to wait for a new minor version to be released. I'd like to get the bugfix released sooner rather than later if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👍 to optional. Please update the control plane architecture doc linked above and add it to the list of optional status fields, assuming everyone else agrees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11ae6a8
to
717e5e4
Compare
717e5e4
to
cddd57b
Compare
|
||
* `failureReason` - is a string that explains why an error has occurred, if possible. | ||
* `failureMessage` - is a string that holds the message contained by the error. | ||
* `externalManagedControlPlane` - is a bool that should be set to true if the Node objects do not | ||
exist in the cluster. For example, managed control plane providers for AKS, EKS, GKE, etc, should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small note: the AKS implementation doesn't have a separate control plane provider, see https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/19e80de4a7e92082ca0c5eb672e486801da23766/templates/cluster-template-aks.yaml
Would that still work as a field on the AzureManagedControlPlane https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/995da71c822c201cfa72cf99ae1b96652544e352/exp/api/v1alpha3/azuremanagedcontrolplane_types.go#L79 as part of the Azure infra provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think it should since we're using controlPlaneRef the same way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CecileRobertMichon Yes, as long as you reference the AzureManagedControlPlane
in the cluster's controlPlaneRef
field you should be able to use the status in your resource as well. Since the check here in CAPI is using the unstructured API, the underlying type is not important. Whatever object is referenced by controlPlaneRef
just needs to have the status field externalManagedControlPlane
if there are no control plane Machine
resources in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/lgtm |
What this PR does / why we need it:
This PR allows the Machine controller to drain and delete Nodes when running under managed control planes.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3631