Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in kubelet --node-ip behavior in 1.29 breaks some deployment models on clouds #125348

Closed
danwinship opened this issue Jun 5, 2024 · 7 comments · Fixed by #125337
Closed
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@danwinship
Copy link
Contributor

(from #125300)

#121028 changed kubelet's startup behavior when using an external cloud provider, so that it no longer tries to guess the right primary node IP before the cloud-controller-manager starts (since in the case where it guessed wrong, this would cause problems). But this means it can no longer set pod.status.hostIP / host-network pod.status.podIP on pods created before the ccm starts, which breaks some deployment models involving using static pods to bootstrap the cluster. (In particular, this is breaking some OpenShift functionality, which assumed that kubelet could deploy etcd and/or kube-apiserver via static pods and pass the node IP to them via the downward API.)

One suggested workaround for this problem is for the administrator to pass --node-ip to kubelet, but that suggestion runs into the problem that doing this would cause the node's node.status.addresses to be stripped of secondary addresses, which may have further side effects, if there is software that assumes that all node IPs will be listed in node.status.address. #125300 considered changing the way ccm treats --node-ip to make it not strip secondary node IPs when --node-ip was used, but as pointed out there, this also threatens to break other things, since some existing software may be depending on the fact that this behavior does happen.

We briefly thought that the problem could be solved by passing --node-ip 0.0.0.0, but it turns out that this doesn't actually do anything different from just not passing --node-ip.

/sig network
/sig node

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 5, 2024
@danwinship
Copy link
Contributor Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 5, 2024
@aojea
Copy link
Member

aojea commented Jun 5, 2024

cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cloud-provider=external --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=0.0.0.0 --node-labels= --pod-infra-container-image=registry.k8s.io/pause:3.9 --provider-id=kind://docker/kind/kind-worker"
  1. Start with cloud-controller-manager down
 kubectl get nodes kind-worker -o json | jq .status.addresses
[
  {
    "address": "192.168.8.4",
    "type": "InternalIP"
  },
  {
    "address": "kind-worker",
    "type": "Hostname"
  }
]
  1. Start the cloud controller manager, it adds the additional IP address missing
kubectl get nodes kind-worker -o json | jq .status.addresses
[
  {
    "address": "kind-worker",
    "type": "Hostname"
  },
  {
    "address": "192.168.8.4",
    "type": "InternalIP"
  },
  {
    "address": "fc00:f853:ccd:e793::4",
    "type": "InternalIP"
  }
]

can this be specific to some cloud providers and not to the generic cloud provider?

@danwinship
Copy link
Contributor Author

It's not about the cloud provider behavior, it's about kubelet's behavior. The problem is that if you start kubelet with --cloud-provider external and either no --node-ip or --node-ip 0.0.0.0 or --node-ip ::, then kubelet will not fill in node.status.addresses.

@aojea
Copy link
Member

aojea commented Jun 5, 2024

Ok, my bad, somehow the environment I'm using didn't get the flag changes on kubelet, I can repro now

I can see that we can implement the unspecified behavior easily and solve this problem, it is also aligned with the existing comment

diff --git a/pkg/kubelet/nodestatus/setters.go b/pkg/kubelet/nodestatus/setters.go
index 19bb6ce7295..522f2407254 100644
--- a/pkg/kubelet/nodestatus/setters.go
+++ b/pkg/kubelet/nodestatus/setters.go
@@ -81,6 +81,9 @@ func NodeAddress(nodeIPs []net.IP, // typically Kubelet.nodeIPs
        secondaryNodeIPSpecified := secondaryNodeIP != nil && !secondaryNodeIP.IsUnspecified()
 
        return func(ctx context.Context, node *v1.Node) error {
+               defer func() {
+                       klog.Infof("DEBUG externalCloudProvider: %v  nodeIPs: %v addresses: %v", externalCloudProvider, nodeIPs, node.Status.Addresses)
+               }()
                if nodeIPSpecified {
                        if err := validateNodeIPFunc(nodeIP); err != nil {
                                return fmt.Errorf("failed to validate nodeIP: %v", err)
@@ -134,7 +137,7 @@ func NodeAddress(nodeIPs []net.IP, // typically Kubelet.nodeIPs
                        // required to ensure the external cloud provider will use the same addresses to avoid the issues explained
                        // in https://github.com/kubernetes/kubernetes/issues/120720.
                        // We are already hinting the external cloud provider via the annotation AnnotationAlphaProvidedIPAddr.
-                       if !nodeIPSpecified {
+                       if nodeIP == nil {
                                node.Status.Addresses = []v1.NodeAddress{
                                        {Type: v1.NodeHostName, Address: hostname},

This time I double checked it works fine

/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --runtime-cgroups=/system.slice/containerd.service --cloud-provider=external -v=4 --node-ip=0.0.0.0 2>&1 | grep DEBUG
I0605 18:36:53.149182  140889 setters.go:85] DEBUG externalCloudProvider: true  nodeIPs: [0.0.0.0] addresses: [{InternalIP 192.168.8.4} {Hostname kind-worker}]
I0605 18:36:53.224030  140889 setters.go:85] DEBUG externalCloudProvider: true  nodeIPs: [0.0.0.0] addresses: [{InternalIP 192.168.8.4} {Hostname kind-worker}]
I0605 18:36:53.231474  140889 setters.go:85] DEBUG externalCloudProvider: true  nodeIPs: [0.0.0.0] addresses: [{InternalIP 192.168.8.4} {Hostname kind-worker}]
I0605 18:36:53.244818  140889 setters.go:85] DEBUG externalCloudProvider: true  nodeIPs: [0.0.0.0] addresses: [{InternalIP 192.168.8.4} {Hostname kind-worker}]

To be honest, this is a bug on my previous change, I was consider !nodeIPSpecified equal to node-ip not set, so I will be +1 on backport this change

@liggitt
Copy link
Member

liggitt commented Jun 7, 2024

that suggestion runs into the problem that doing this would cause the node's node.status.addresses to be stripped of secondary addresses, which may have further side effects, if there is software that assumes that all node IPs will be listed in node.status.address

External providers have to update node status addresses dynamically with the full set of addresses. The kubelet reacts dynamically to that and gets an updated serving certificate, so that should all work dynamically and correctly.

@danwinship
Copy link
Contributor Author

External providers have to update node status addresses dynamically with the full set of addresses.

They only do that if you don't pass --node-ip to kubelet. If you pass --node-ip to kubelet to override the default choice of node IP, then the cloud provider will omit all other InternalIPs from node.status.addresses. (This was ~always the case for legacy cloud providers and has been the case for external cloud providers at least since "before they were cool".)

@liggitt
Copy link
Member

liggitt commented Jun 7, 2024

They only do that if you don't pass --node-ip to kubelet.

Well, that seems like the spot to change if you want external providers report more addresses.

Reverting kubelet to report auto-detected-and-known-to-sometimes-be-incorrect addresses when running in external cloud provider mode seems like the wrong way to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants