-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kubeadm control plane] upgrade: etcd CA was regenerated #2455
Comments
I wonder if there is an odd race condition that could be taking place in the way we are doing |
/milestone v0.3.0 |
Yeah, that's what was so weird about it – the whole control plane bootstrapped with 3 nodes, everything was great, and then as soon as controlplane-0 finishes deleting the etcd certs change. The only other guess I have is that controlplane-0's kubeadm config was different than the others (it was init, not join) and since it was up first there might have been some weird owner references – I haven't gotten a chance to investigate yet, though. |
One thing that probably deserves a check... What resource owns the secrets? If it is a Machine, that would explain the bug.. |
Ah ha, sure enough:
But when I create a control plane using a KCP:
Which means this is an omission from the adoption bits – closing in favor of #2214. /close |
@sethp-nr: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What steps did you take and what happened:
After the first of three control plane machines were upgraded from v1.15.9 to v.1.16.6, I started getting etcd health check failures (see #2454 and #2451). After a while, it became clear that the cert & private key stored in the management cluster's Secret had diverged from what was on disk on the control plane nodes.
I'm not sure what caused the secret to be re-generated, but it seemed worth noting.
Anything else you would like to add:
I was running my management cluster with tilt up against a local
kind
, which on my machine has a side effect of... let's call it "timing issue detection." Everything slows way down in my userland and inside the controllers, and there's non-infrequent crashes in the controller. I recall the kubeadm control plane controller specifically was restarted about the time that the etcd certs changed.Environment:
kubectl version
): mixed/etc/os-release
): ubuntu/kind bug
The text was updated successfully, but these errors were encountered: