Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot scale new nodes with CNI version 1.5.3 #625

Closed
cgkades opened this issue Sep 25, 2019 · 17 comments
Closed

Cannot scale new nodes with CNI version 1.5.3 #625

cgkades opened this issue Sep 25, 2019 · 17 comments
Labels
bug priority/P1 Must be staffed and worked currently or soon. Is a candidate for next release

Comments

@cgkades
Copy link

cgkades commented Sep 25, 2019

When trying to scale new nodes after creating an EKS cluster, the new nodes will not leave the NotReady state. The aws-node pod is in a CrashLoopBackOff state.

This seems to be from the file 10-aws.conflist not being placed on the node. It looks like the install script had a line removed that added the file, and it was then moved to main.go. The problem is that it never reaches the code to copy the file

bash-4.2# cat /host/var/log/aws-routed-eni/ipamd.log.2019-09-25-20
2019-09-25T20:01:32.800Z [INFO] Starting L-IPAMD v1.5.3  ...
2019-09-25T20:02:02.803Z [INFO] Testing communication with server
2019-09-25T20:02:32.803Z [INFO] Failed to communicate with K8S Server. Please check instance security groups or http proxy setting
2019-09-25T20:02:32.803Z [ERROR]        Failed to create client: error communicating with apiserver: Get https://172.20.0.1:443/version?timeout=32s: dial tcp 172.20.0.1:443: i/o timeout
2019-09-25T20:02:33.901Z [INFO] Starting L-IPAMD v1.5.3  ...
2019-09-25T20:03:03.903Z [INFO] Testing communication with server

I apologize but I wasn't able to dig in far enough to know why it needs the 10-aws.conflist in /etc/cni/net.d first, but that does seem to be the issue.

@matthsmi
Copy link

To expand on what @cgkades is saying...

We've found is that k8sapi.CreateKubeClient() fails with Failed to create client: error communicating with apiserver: Get https://172.20.0.1:443/version?timeout=32s: dial tcp 172.20.0.1:443: i/o timeout (https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.5.3/main.go#L47-L51).

As such, it never gets to copy over the /etc/cni/10-aws.conflist. If we manually create the 10-aws.conflist all is right and aws-node starts up.

This behavior isn't in 1.5.1 and appears to be introduced in commit: 7fd7c93

@mogren mogren added bug priority/P1 Must be staffed and worked currently or soon. Is a candidate for next release labels Sep 25, 2019
@mogren
Copy link
Contributor

mogren commented Sep 25, 2019

@matthsmi @cgkades Thanks for opening a new ticket. (#283 is not related to this case)

The change to not copy the config until ipamd is running happened in commit f9acdeb and was done to avoid scheduling pods to that node until we had ENIs and IPs available. If ipamd can not talk to the kubernetes API server, it will not copy the plugin binary or the config file, since that is how you update the node to be in the Ready state.

The main issue is that your nodes cannot reach your API server. If this happens again, could you please run the log-collector-script and send it through support, together with cluster ARN?

@cgkades
Copy link
Author

cgkades commented Sep 26, 2019

@mogren this happens with every cluster we launch, i'll work with @matthsmi to see if we can get the logs you need

@marcel4303
Copy link

Having the same issue as @cgkades .
I've informed AWS Support and attached the results of the log-collector-script as @mogren requested.

@matthsmi
Copy link

matthsmi commented Oct 8, 2019

Thanks for the update, @marcel4303. I'm glad that we're not the only ones experiencing this, and thanks for opening up a ticket with AWS support, too.

We've also got an internal high priority ticket open with AWS. In the mean time, we've been using aws cni 1.5.1. We'll be sure to update this issue with relevant details as we hear back internally.

@mogren
Copy link
Contributor

mogren commented Oct 8, 2019

Looking at the reports I have gotten so far, the main issue seems to be the DNS call where the CNI tries to reach the API server. The first two lines in the ipamd logs..

2019-09-25T20:01:32.800Z [INFO] Starting L-IPAMD v1.5.3  ...
2019-09-25T20:02:02.803Z [INFO] Testing communication with server

...usually says it took around 30s to do kubeClient := k8sclient.GetKubeClient() on nodes that have issues. That looks a lot like a 30s timeout. That code is from the operator-framework and it tries to read the KUBERNETES_SERVICE_HOST environment variable, then if that fails, it does a net.LookupHost("kubernetes.default.svc"). For some reason, on these nodes, that takes 30s to time out.

Is there some custom DNS setup on these nodes? What does /etc/resolv.conf look like?

@marcel4303
Copy link

Your welcome.
I've tried also to downgrade the cni version to 1.5.1. This results in a ready state node(s) however the pods then encounter issues and remain in the state of ContainerCreating.

Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "f9b30e13de653234be1dec6d91a4fa7fcd9b9aaddfec191f6d8b839d2722b0c4" network for pod "coffee-7dbb5795f6-ckvrc": NetworkPlugin cni failed to set up pod "coffee-7dbb5795f6-ckvrc_cafe" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused", failed to clean up sandbox container "f9b30e13de653234be1dec6d91a4fa7fcd9b9aaddfec191f6d8b839d2722b0c4" network for pod "coffee-7dbb5795f6-ckvrc": NetworkPlugin cni failed to teardown pod "coffee-7dbb5795f6-ckvrc_cafe" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"]
  Normal   SandboxChanged          5s (x93 over 20m)  kubelet, ip-172-31-30-47.eu-central-1.compute.internal  Pod sandbox changed, it will be killed and re-created.

@blakelead
Copy link

FYI, also had this issue in CNI 1.5.4.
Downgrading to 1.5.1 before scaling nodes resolved this for us.

@mogren
Copy link
Contributor

mogren commented Oct 10, 2019

Does this happen on worker nodes using 1.13.7 kubelet? The reason I ask is that there was an upstream bug:

If a pod's restart policy is "Never", and kubelet failed to create the PodSandbox (for whatever reason) once, the pod will be stuck at "ContainerCreating" forever.

1.13.8 and later has this fix kubernetes/kubernetes#79398

@yannick
Copy link

yannick commented Jan 3, 2020

i seem to be also hit by this. using kops i tear up a cluster, e.g. with kope.io/k8s-1.15-debian-stretch-amd64-hvm-ebs-2019-09-26 on kubernetesVersion: 1.16.4
the masters get up fine but with nodes its hit and miss. i also tried imageName: "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.6.0-rc5"

@mogren
Copy link
Contributor

mogren commented Jan 20, 2020

@yannick Are the pods crash looping unable to talk to the API server, or some other issue?

@matthsmi
Copy link

I've made some more progress with this, but today using the v1.6.0 of the aws-node CNI.

To re-state what I am seeing. When I initially spin up a cluster, and associated instances in an ASG to the eks cluster, the nodes join the cluster, aws-node starts, and they become Ready.

When I try to expand the ASG, and add new instances, these new instance are able to join the cluster, but aws-node does not start up. You can see them in the cluster via kubectl get nodes, but they are in the NotReady state.

When aws-k8s-agent tries to start up, it reaches this point: https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.6/pkg/k8sapi/discovery.go#L89

The kubeClient endpoint is: https://172.20.0.1:443, which is not yet available to the system, since no kube-proxy is running at that time. No kube-proxy will start since no CNI files are in place yet, since the aws-k8s-agent hasn't bootstrapped. If aws-k8s-agent doesn't bootstrap in a timely fashion, the container entrypoint.sh will fail out, and not copy any CNI files.

In my testing, I shuffled things around in the entrypoint.sh

diff --git a/scripts/entrypoint.sh b/scripts/entrypoint.sh
index d7dc7f50..636b02d3 100755
--- a/scripts/entrypoint.sh
+++ b/scripts/entrypoint.sh
@@ -58,16 +58,6 @@ echo -n "starting IPAM daemon in background ... "
 ./aws-k8s-agent > "$AGENT_LOG_PATH" 2>&1 &
 echo "ok."
 
-echo -n "checking for IPAM connectivity ... "
-
-if ! wait_for_ipam; then
-    echo " failed."
-    echo "timed out waiting for IPAM daemon to start."
-    exit 1
-fi
-
-echo "ok."
-
 echo -n "copying CNI plugin binaries and config files ... "
 
 cp portmap "$HOST_CNI_BIN_PATH"
@@ -84,6 +74,16 @@ if [[ -f "$HOST_CNI_CONFDIR_PATH/aws.conf" ]]; then
     rm "$HOST_CNI_CONFDIR_PATH/aws.conf"
 fi
 
+echo -n "checking for IPAM connectivity ... "
+
+if ! wait_for_ipam; then
+    echo " failed."
+    echo "timed out waiting for IPAM daemon to start."
+    exit 1
+fi
+
+echo "ok."
+
 # bring the aws-k8s-agent process back into the foreground
 echo "foregrounding IPAM daemon ... "
 fg %1 >/dev/null 2>&1 || { echo "failed (process terminated)" && cat "$AGENT_LOG_PATH" && exit 1; }

I lowered the wait_for_ipam until after the CNI files get copied. At this time kube-proxy starts up, aws-k8s-agent gets its act together, and things start working.

I am not sure the adverse affects of this from the IPAM perspective, but my initial testing is looking good, and I've been able to scale in and out an ASG with no issue with the modified entrypoint.sh.

I'm going to continue testing, but this is where I'm at.

Do you have any thoughts about this @mogren ?

thank you!

@jaypipes
Copy link
Contributor

jaypipes commented Mar 2, 2020

The kubeClient endpoint is: https://172.20.0.1:443, which is not yet available to the system, since no kube-proxy is running at that time. No kube-proxy will start since no CNI files are in place yet, since the aws-k8s-agent hasn't bootstrapped. If aws-k8s-agent doesn't bootstrap in a timely fashion, the container entrypoint.sh will fail out, and not copy any CNI files.

AFAIK, the above statement is not correct. kube-proxy does not have anything to do with the aws-node Pod's ability to contact the Kubernetes API server. kube-proxy enables routing for a Kubernetes Service object's backend Endpoints (other Pods).

The timeout waiting for IPAMd (aws-k8s-agent) to come up is 30 seconds. My suspicion is that there is a problem with DNS resolution, not kube-proxy, that is causing the IPAMd startup to take longer than 30 seconds. On startup, IPAMd attempts to communicate with the Kubernetes API server using the k8sclient.GetKubeClient() function from the [email protected] release (as @mogren notes above).

Is there some custom DNS setup on these nodes? What is the value of your KUBERNETES_SERVICE_HOST env var? What does your /etc/resolv.conf look like?

@matthsmi
Copy link

matthsmi commented Mar 2, 2020

Hi @jaypipes ,

Thanks for the input. It seems I missed the comment from @mogren above.

That said, I'm still confident that this isn't a DNS resolution issue.

I had put some additional debug lines inside the k8s-agent, and had it print out the kubeconfig that gets provided from k8sclient.GetCubeConfig() (which should also be what is used for k8sclient.GetKubeClient()).

Here's the Host that aws-node is trying to use:

kubeConfig=&rest.Config{Host:"https://172.20.0.1:443 ", APIPath:"", ContentConfig:rest.ContentConfig{AcceptContentTypes:"",.... stuff removed}

There shouldn't be any DNS resolution happening there.

This also matches up to the ClusterIP that the aws-cni is trying to speak to:

% k get service
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   73m
% 

That said here is what the aws-node is attempting to use for KUBERNETES_SERVICE_* and for DNS.

# docker inspect d356efd553b9 | jq '.[0].Config.Env'                           
[
  "AWS_VPC_K8S_CNI_LOGLEVEL=DEBUG",
 [...]
  "KUBERNETES_SERVICE_HOST=172.20.0.1",
  "KUBERNETES_SERVICE_PORT=443",
[...]
  "KUBERNETES_SERVICE_PORT_HTTPS=443",
[...]
  "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
# 
# cat /etc/resolv.conf 
nameserver 169.254.169.253
search us-west-2.compute.internal
# 

If aws-node is attempting to communicate to the API Server via ClusterIP, then kube-proxy should be working.

@acegrader33
Copy link

The kubeClient endpoint is: https://172.20.0.1:443, which is not yet available to the system, since no kube-proxy is running at that time. No kube-proxy will start since no CNI files are in place yet, since the aws-k8s-agent hasn't bootstrapped.

I believe that adding tolerations to our kube-proxy daemonset that allow it to start on nodes with NotReady status fixed this error for us.

tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

@jaypipes
Copy link
Contributor

sorry for the delayed response @matthsmi! You are of course correct. We suspect that this is the same issue as #872 and #865, where we believe the root cause of the problem is, as you allude to, the aws-node Daemonset failing to start when kube-proxy has not (yet) fully come online.

We are going to add a configurable timeout to the entrypoint.sh script and bump the default timeout to a larger than 30 second default. We hope this will address the issues.

jaypipes added a commit to jaypipes/amazon-vpc-cni-k8s that referenced this issue Mar 18, 2020
Adds a configurable timeout to the aws-k8s-agent (ipamd) startup in the
entrypoint.sh script. Increases the default timeout from ~30 seconds to
60 seconds.

Users can set the IPAMD_TIMEOUT_SECONDS environment variable to change
the timeout.

Related: aws#625, aws#865 aws#872
mogren pushed a commit that referenced this issue Jun 24, 2020
* add configurable timeout for ipamd startup

Adds a configurable timeout to the aws-k8s-agent (ipamd) startup in the
entrypoint.sh script. Increases the default timeout from ~30 seconds to
60 seconds.

Users can set the IPAMD_TIMEOUT_SECONDS environment variable to change
the timeout.

Related: #625, #865 #872

* This is a local gRPC call, so just try every 1 second indefinitely

Since we have a liveness probe restarting the probe, we can rely on that to kill the pod.

Co-authored-by: Claes Mogren <[email protected]>
bnapolitan added a commit to bnapolitan/amazon-vpc-cni-k8s that referenced this issue Jul 1, 2020
commit d938e5e
Author: Jayanth Varavani <[email protected]>
Date:   Wed Jul 1 01:19:14 2020 +0000

    Json o/p for logs from entrypoint.sh

commit 2d20308
Author: Nathan Prabhu <[email protected]>
Date:   Mon Jun 29 18:06:22 2020 -0500

    bugfix: make metrics-helper docker logging statement multi-arch compatible

commit bf9ded3
Author: Claes Mogren <[email protected]>
Date:   Sat Jun 27 14:51:35 2020 -0700

    Use install command instead of cp

commit e3b7dbb
Author: Gyuho Lee <[email protected]>
Date:   Mon Jun 29 09:40:02 2020 -0700

    scripts/lib: bump up tester to v1.4.0

    Signed-off-by: Gyuho Lee <[email protected]>

commit c369480
Author: Claes Mogren <[email protected]>
Date:   Sun Jun 28 12:19:27 2020 -0700

    Some refresh cleanups

commit 8c266e9
Author: Claes Mogren <[email protected]>
Date:   Sun Jun 28 18:37:46 2020 -0700

    Run staticcheck and clean up

commit 8dfc5b1
Author: Jayanth Varavani <[email protected]>
Date:   Sun Jun 28 17:39:20 2020 -0700

    Fix integration test script for code pipeline (aws#1062)

    Co-authored-by: Claes Mogren <[email protected]>

commit 52306be
Author: Murcherla <[email protected]>
Date:   Wed Jun 24 23:37:24 2020 -0500

    minor nits, fast follow up to PR 903

commit 4ddd248
Author: Claes Mogren <[email protected]>
Date:   Sun Jun 14 23:20:22 2020 -0700

    Add bandwidth plugin

commit 6d35fda
Author: Robert Sheehy <[email protected]>
Date:   Fri May 22 21:11:12 2020 -0500

    Chain interface to other CNI plugins

commit 30f98bd
Author: Penugonda <[email protected]>
Date:   Thu Jun 25 15:14:00 2020 -0400

    removed custom networking default vars, introspection var

commit aa8b818
Author: Penugonda <[email protected]>
Date:   Wed Jun 24 19:11:38 2020 -0400

    updated manifest configs with default env vars

commit a073d66
Author: Nithish Murcherla <[email protected]>
Date:   Wed Jun 24 16:51:38 2020 -0500

    refresh subnet/CIDR information every 30 seconds and update ip rules to map pods (aws#903)

    Co-authored-by: Claes Mogren <[email protected]>

commit a0da387
Author: Claes Mogren <[email protected]>
Date:   Wed Jun 24 12:30:45 2020 -0700

    Default to random-fully (aws#1048)

commit 9fea153
Author: Claes Mogren <[email protected]>
Date:   Sun Jun 14 22:37:10 2020 -0700

    Update probe settings

    * Reduce readiness probe startup delay
    * Increase liveness polling period
    * Reduce shutdown grace period to 10 seconds

commit ad7df34
Author: Jay Pipes <[email protected]>
Date:   Wed Jun 24 02:06:23 2020 -0400

    Remove timeout for ipamd startup (aws#874)

    * add configurable timeout for ipamd startup

    Adds a configurable timeout to the aws-k8s-agent (ipamd) startup in the
    entrypoint.sh script. Increases the default timeout from ~30 seconds to
    60 seconds.

    Users can set the IPAMD_TIMEOUT_SECONDS environment variable to change
    the timeout.

    Related: aws#625, aws#865 aws#872

    * This is a local gRPC call, so just try every 1 second indefinitely

    Since we have a liveness probe restarting the probe, we can rely on that to kill the pod.

    Co-authored-by: Claes Mogren <[email protected]>

commit 1af40d2
Author: Jayanth Varavani <[email protected]>
Date:   Fri Jun 19 10:14:44 2020 -0700

    Changelog and config file changes for v1.6.3

commit 14d5135
Author: Ari Becker <[email protected]>
Date:   Wed Jun 17 09:39:21 2020 +0300

    Generated the different configurations

commit 00395cb
Author: Ari Becker <[email protected]>
Date:   Tue Jun 16 14:33:55 2020 +0300

    Fix discovery RBAC issues in Kubernetes 1.17

commit 7e224af
Author: Gyuho Lee <[email protected]>
Date:   Mon Jun 15 16:04:44 2020 -0700

    scripts/lib/aws: bump up tester to v1.3.9

    Includes improvements to log fetcher + MNG deletion when metrics server
    is installed.

    Signed-off-by: Gyuho Lee <[email protected]>

commit 36286ba
Author: Claes Mogren <[email protected]>
Date:   Mon Jun 15 07:56:59 2020 -0700

    Remove Printf and format test (aws#1027)

commit af54066
Author: Gyuho Lee <[email protected]>
Date:   Sat Jun 13 01:31:08 2020 -0700

    scripts/lib/aws: tester v1.3.6, enable color outputs (aws#1025)

    Includes various bug fixes + color output if $TERM is supported.
    Fallback to plain text output automatic.

    ref.
    https://github.com/aws/aws-k8s-tester/blob/master/CHANGELOG/CHANGELOG-1.3.md#v136-2020-06-12

    Signed-off-by: Gyuho Lee <[email protected]>

commit 6d52e1b
Author: jayanthvn <[email protected]>
Date:   Fri Jun 12 16:26:33 2020 -0700

    added warning if delete on termination is set to false for the primar… (aws#1024)

    * Added a warning message if delete on termination is set to false for the primary ENI
@mogren
Copy link
Contributor

mogren commented Jul 8, 2020

This all seems to be related to issues with kube-proxy, closing.

@mogren mogren closed this as completed Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug priority/P1 Must be staffed and worked currently or soon. Is a candidate for next release
Projects
None yet
Development

No branches or pull requests

8 participants