-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAPI won't delete Nodes in "managed" control planes because there are no control plane machines #3631
Comments
This is indeed a bug / use case we never thought of. Looking at the code, we don't seem to take into account that a control plane provider might be set, maybe we should check if there are remaining control plane nodes in the actual cluster instead of checking Machines |
/assign |
Trying to think of the best approach here. Is this section of code necessary at all? As I understand, by the time we get to this code the infrastructure will be deleted regardless of the result of this check, so this code just seems to result in orphaned nodes. |
Copying my comment from the PR:
@dthorsen These checks were added a while back to make sure we wouldn't get stuck trying to delete a node if the control plane nodes were being deleted, or if there weren't any more machines available. I'd try to see if we can agree on a status field and have a fallback that uses the current logic |
@vincepri What do you think about a status field on the controlplane objects like |
🤔 That could be interesting. One question before we proceed, does EKS or other cloud provider expose the number of control plane nodes available somehow? Or is it all done behind the scenes? |
EKS, GKE, and AKS do not expose the details of the number of control plane nodes or anything like that. The control plane is totally provided as a managed service and how they design it is internal to those organizations. Another possible name for the status field could be something like |
I like the managedControlPlane idea. Where would that live? On the provider-specific ManagedControlPlane struct in a status field? |
Yes exactly. I have most of the code written now, just testing some things out locally. |
Given that this is internal, we could also use an annotation on the control plane referenced object itself, that signals cluster api that this is an externally managed control plane. |
This seems like something that should be set by the controller though as opposed to the user. That is why I gravitated toward a status field. |
Wouldn't using an actual field make more sense / haven't we been trying to get away from API-in-annotations? |
Definitely should be set by the controller |
I asked myself, "is this useful for users to see/know" and couldn't come up with a clear answer, also being a managed control plane, is that something that should be in status? It all seems internal to Cluster API, rather than something we'd want to expose, but I'm ok either way. One other benefit of having it as an annotation is using ObjectMeta in code, so that Cluster API doesn't have to write custom code to get |
Was thinking something like this in util.go // IsManagedControlPlane returns a bool indicating whether the control plane referenced
// in the passed ObjectReference is a managed control plane
func IsManagedControlPlane(controlPlaneRef *corev1.ObjectReference) bool {
controlPlane := ObjectReferenceToUnstructured(*controlPlaneRef)
managed, found, err := unstructured.NestedBool(controlPlane.Object, "status", "managedControlPlane")
if err != nil || !found {
return false
}
return managed
} |
LGTM, maybe have |
Works for me |
/lifecycle active |
PR #3673 is ready for review and should fix this. I will open a follow up PR to the AWS provider to add. the |
/milestone v0.3.10 |
What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
What did you expect to happen:
Deleting Machines deletes Nodes too.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
I'm fairly certain the check here is the issue:
cluster-api/controllers/machine_controller.go
Lines 379 to 396 in feaff31
No control plane Machines in an EKS cluster.
Environment:
kubectl version
): v1.17.9/etc/os-release
): Amazon Linux 2/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
The text was updated successfully, but these errors were encountered: