[kubeadm control plane] upgrade: etcd CA was regenerated #2455

sethp-nr · 2020-02-26T17:03:02Z

What steps did you take and what happened:

After the first of three control plane machines were upgraded from v1.15.9 to v.1.16.6, I started getting etcd health check failures (see #2454 and #2451). After a while, it became clear that the cert & private key stored in the management cluster's Secret had diverged from what was on disk on the control plane nodes.

I'm not sure what caused the secret to be re-generated, but it seemed worth noting.

Anything else you would like to add:

I was running my management cluster with tilt up against a local kind, which on my machine has a side effect of... let's call it "timing issue detection." Everything slows way down in my userland and inside the controllers, and there's non-infrequent crashes in the controller. I recall the kubeadm control plane controller specifically was restarted about the time that the etcd certs changed.

Environment:

Cluster-api version: master
Minikube/KIND version: kind v0.7.0 go1.13.6 darwin/amd64
Kubernetes version: (use kubectl version): mixed
OS (e.g. from /etc/os-release): ubuntu

/kind bug

The text was updated successfully, but these errors were encountered:

detiber · 2020-02-26T17:17:03Z

I wonder if there is an odd race condition that could be taking place in the way we are doing LookupOrGenerate for the secrets, but at a quick glance it appears that it should not overwrite them if they are already created...

vincepri · 2020-02-26T17:59:53Z

/milestone v0.3.0

sethp-nr · 2020-02-26T18:59:08Z

Yeah, that's what was so weird about it – the whole control plane bootstrapped with 3 nodes, everything was great, and then as soon as controlplane-0 finishes deleting the etcd certs change.

The only other guess I have is that controlplane-0's kubeadm config was different than the others (it was init, not join) and since it was up first there might have been some weird owner references – I haven't gotten a chance to investigate yet, though.

detiber · 2020-02-26T19:01:14Z

One thing that probably deserves a check... What resource owns the secrets? If it is a Machine, that would explain the bug..

sethp-nr · 2020-02-29T00:13:04Z

Ah ha, sure enough:

  name: test-etcd
  ...
  ownerReferences:
  - apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
    blockOwnerDeletion: true
    controller: true
    kind: KubeadmConfig
    name: controlplane-0

But when I create a control plane using a KCP:

    ownerReferences:
    - apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
      blockOwnerDeletion: true
      controller: true
      kind: KubeadmControlPlane
      name: sp2a-controlplane
      ...

Which means this is an omission from the adoption bits – closing in favor of #2214.

/close

k8s-ci-robot · 2020-02-29T00:13:05Z

@sethp-nr: Closing this issue.

In response to this:

Ah ha, sure enough:

 name: test-etcd
 ...
 ownerReferences:
 - apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
   blockOwnerDeletion: true
   controller: true
   kind: KubeadmConfig
   name: controlplane-0

But when I create a control plane using a KCP:

   ownerReferences:
   - apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
     blockOwnerDeletion: true
     controller: true
     kind: KubeadmControlPlane
     name: sp2a-controlplane
     ...

Which means this is an omission from the adoption bits – closing in favor of #2214.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 26, 2020

sethp-nr changed the title ~~KubeadmControlPlane upgrade: etcd CA was regenerated~~ [kubeadm control plane] upgrade: etcd CA was regenerated Feb 26, 2020

k8s-ci-robot added this to the v0.3.0 milestone Feb 26, 2020

chuckha mentioned this issue Feb 27, 2020

🏃 refactor the etcd client in the cluster object #2470

Merged

k8s-ci-robot closed this as completed Feb 29, 2020

sethp-nr mentioned this issue Feb 29, 2020

✨ KCP adopts existing machines #2489

Merged

7 tasks

vincepri added the area/control-plane Issues or PRs related to control-plane lifecycle management label Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kubeadm control plane] upgrade: etcd CA was regenerated #2455

[kubeadm control plane] upgrade: etcd CA was regenerated #2455

sethp-nr commented Feb 26, 2020 •

edited

Loading

detiber commented Feb 26, 2020

vincepri commented Feb 26, 2020

sethp-nr commented Feb 26, 2020

detiber commented Feb 26, 2020

sethp-nr commented Feb 29, 2020

k8s-ci-robot commented Feb 29, 2020

[kubeadm control plane] upgrade: etcd CA was regenerated #2455

[kubeadm control plane] upgrade: etcd CA was regenerated #2455

Comments

sethp-nr commented Feb 26, 2020 • edited Loading

detiber commented Feb 26, 2020

vincepri commented Feb 26, 2020

sethp-nr commented Feb 26, 2020

detiber commented Feb 26, 2020

sethp-nr commented Feb 29, 2020

k8s-ci-robot commented Feb 29, 2020

sethp-nr commented Feb 26, 2020 •

edited

Loading