Ingress Nginx Ingress Controller fails to start on EKS Node #9230

PhilipBehrenberg · 2022-10-28T21:13:22Z

What happened:

Ingress Controller pods error on startup and enter CrashLoopBackOff. This system is running on EKS on customized versions of the official AWS EKS nodes.

Ingress Controller Pod Logs

W1028 20:10:58.192748       6 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1028 20:10:58.192851       6 main.go:209] "Creating API client" host="https://172.20.0.1:443"
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.4.0
  Build:         50be2bf95fd1ef480420e2aa1d6c5c7c138c95ea
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

-------------------------------------------------------------------------------

I1028 20:10:58.216752       6 main.go:253] "Running in Kubernetes cluster" major="1" minor="23+" git="v1.23.10-eks-15b7512" state="clean" commit="cd6399691d9b1fed9ec20c9c5e82f5993c3f42cb" platform="linux/amd64"
I1028 20:10:58.435662       6 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
W1028 20:10:58.459181       6 nginx.go:83] Error reading system nameservers: open /etc/resolv.conf: permission denied
I1028 20:10:58.460332       6 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I1028 20:10:58.482546       6 nginx.go:260] "Starting NGINX Ingress controller"
I1028 20:10:58.497973       6 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"6e89c60a-a27d-4d88-af8c-eec4c5b583f9", APIVersion:"v1", ResourceVersion:"221955", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller
I1028 20:10:58.498017       6 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-tcp", UID:"d00ca974-f5b1-4ac4-a69b-d20a9d615dbe", APIVersion:"v1", ResourceVersion:"221956", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-tcp
I1028 20:10:59.684612       6 nginx.go:303] "Starting NGINX process"
I1028 20:10:59.686089       6 leaderelection.go:248] attempting to acquire leader lease ingress-nginx/ingress-controller-leader...
I1028 20:10:59.688208       6 nginx.go:323] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
W1028 20:10:59.691719       6 controller.go:424] Error getting Service "default/connection-manager": no object matching key "default/connection-manager" in local store
I1028 20:10:59.692016       6 controller.go:168] "Configuration changes detected, backend reload required"
I1028 20:10:59.703401       6 leaderelection.go:258] successfully acquired lease ingress-nginx/ingress-controller-leader
I1028 20:10:59.703539       6 status.go:84] "New leader elected" identity="ingress-nginx-controller-58cmx"
I1028 20:10:59.762241       6 controller.go:185] "Backend successfully reloaded"
I1028 20:10:59.762473       6 controller.go:196] "Initial sync, sleeping for 1 second"
I1028 20:10:59.762799       6 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-58cmx", UID:"fd0cbd96-d926-402a-a572-3e6a6761419d", APIVersion:"v1", ResourceVersion:"413390", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/10/28 20:10:59 [error] 25#25: init_by_lua error: init_by_lua:9: require failed: /etc/nginx/lua/util/resolv_conf.lua:70: could not open /etc/resolv.conf: /etc/resolv.conf: Permission denied
stack traceback:
        [C]: in function 'error'
        init_by_lua:9: in main chunk
W1028 20:11:00.763534       6 controller.go:216] Dynamic reconfiguration failed (retrying; 15 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:01.778427       6 controller.go:216] Dynamic reconfiguration failed (retrying; 14 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:03.203239       6 controller.go:216] Dynamic reconfiguration failed (retrying; 13 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:05.011439       6 controller.go:216] Dynamic reconfiguration failed (retrying; 12 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:07.240442       6 controller.go:216] Dynamic reconfiguration failed (retrying; 11 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:10.355045       6 controller.go:216] Dynamic reconfiguration failed (retrying; 10 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:14.279282       6 controller.go:216] Dynamic reconfiguration failed (retrying; 9 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:19.265449       6 controller.go:216] Dynamic reconfiguration failed (retrying; 8 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:26.012199       6 controller.go:216] Dynamic reconfiguration failed (retrying; 7 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:34.984880       6 controller.go:216] Dynamic reconfiguration failed (retrying; 6 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W1028 20:11:45.920391       6 controller.go:216] Dynamic reconfiguration failed (retrying; 5 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
I1028 20:11:56.847909       6 sigterm.go:36] "Received SIGTERM, shutting down"
I1028 20:11:56.847935       6 nginx.go:379] "Shutting down controller queues"
I1028 20:11:56.872811       6 nginx.go:387] "Stopping admission controller"
E1028 20:11:56.872857       6 nginx.go:326] "Error listening for TLS connections" err="http: Server closed"
I1028 20:11:56.872864       6 nginx.go:395] "Stopping NGINX process"
2022/10/28 20:11:56 [warn] 41#41: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:150
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:150
2022/10/28 20:11:56 [warn] 41#41: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:151
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:151
2022/10/28 20:11:56 [warn] 41#41: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:152
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:152
2022/10/28 20:11:56 [notice] 41#41: signal process started
I1028 20:11:57.902547       6 nginx.go:408] "NGINX process has stopped"
I1028 20:11:57.902570       6 sigterm.go:44] Handled quit, delaying controller exit for 10 seconds
E1028 20:11:59.703761       6 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ManagedFields:[]ManagedFieldsEntry{},}"
W1028 20:12:00.322104       6 controller.go:216] Dynamic reconfiguration failed (retrying; 4 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
I1028 20:12:07.903088       6 sigterm.go:47] "Exiting" code=0

What you expected to happen:

The ingress controller pods should start up and enable incoming connections from the NLB that was created by the inrgess-nginx.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.4.0
  Build:         50be2bf95fd1ef480420e2aa1d6c5c7c138c95ea
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.7-eks-4721010", GitCommit:"b77d9473a02fbfa834afa67d677fd12d690b195f", GitTreeState:"clean", BuildDate:"2022-06-27T22:22:16Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.10-eks-15b7512", GitCommit:"cd6399691d9b1fed9ec20c9c5e82f5993c3f42cb", GitTreeState:"clean", BuildDate:"2022-08-31T19:17:01Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: AWS EKS customized nodes built on top of official EKS nodes
OS (e.g. from /etc/os-release): Amazon Linux 2
Kernel (e.g. uname -a): Linux 5.4.209-116.367.amzn2.x86_64 Basic structure #1 SMP Wed Aug 31 00:09:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
- The nodes are run with kubelet, via the build-in EKS bootstrap script.
- The nodes are created using the EC2 image builder. The base image is the most recent EKS-ready image.
- Additional recipes:
  - stig-build-linux-high
  - Custom recipe that installs several additional monitoring systems
Basic cluster related info:
- Client Version: v1.23.7-eks-4721010
- Server Version: v1.23.10-eks-15b7512

NAME                                              STATUS   ROLES    AGE   VERSION               INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-3-198.[region removed].compute.internal   Ready    <none>   39h   v1.23.9-eks-ba74326   10.0.3.198    <none>        Amazon Linux 2   5.4.209-116.367.amzn2.x86_64   docker://20.10.17
ip-10-0-5-135.[region removed].compute.internal   Ready    <none>   39h   v1.23.9-eks-ba74326   10.0.5.135    <none>        Amazon Linux 2   5.4.209-116.367.amzn2.x86_64   docker://20.10.17

How was the ingress-nginx-controller installed:
- ingress-nginx ingress-nginx 1 2022-10-27 23:40:51.1602531 +0000 UTC deployed ingress-nginx-4.3.0 1.4.0

Ingress Controller Values File

controller:
  config:
    allow-snippet-annotations: "true"
    http-snippet: |
      server {
        listen 2443;
        return 308 https://$host$request_uri;
      }
    proxy-real-ip-cidr: 10.0.0.0/16
    use-forwarded-headers: "true"
  kind: DaemonSet
  containerPort:
    http: 80
    https: 80
    tohttps: 2443
  service:
    targetPorts:
      http: tohttps
      https: http
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws-us-gov:acm:[region-removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
      service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600"
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
tcp:
  "32443": default/app:9443

Current State of the controller:

Describe IngresClass

Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.4.0
              helm.sh/chart=ingress-nginx-4.3.0
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>

Describe All

Name:         ingress-nginx-controller-58cmx
Namespace:    ingress-nginx
Priority:     0
Node:         ip-10-0-5-135.[region removed].compute.internal/10.0.5.135
Start Time:   Thu, 27 Oct 2022 23:40:56 +0000
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              controller-revision-hash=6ffbc58587
              pod-template-generation=1
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Running
IP:           10.0.5.85
IPs:
  IP:           10.0.5.85
Controlled By:  DaemonSet/ingress-nginx-controller
Containers:
  controller:
    Container ID:  docker://63fa2d76ba1f47d4ea09c9a393bb33a09dc28ea43e7a8d49af94c7bc19f89bad
    Image:         registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143
    Image ID:      docker-pullable://registry.k8s.io/ingress-nginx/controller@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143
    Ports:         80/TCP, 80/TCP, 2443/TCP, 8443/TCP, 32443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-controller-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 28 Oct 2022 20:55:18 +0000
      Finished:     Fri, 28 Oct 2022 20:56:27 +0000
    Ready:          False
    Restart Count:  349
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-58cmx (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g6hpb (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-g6hpb:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                     From                      Message
  ----     ------     ----                    ----                      -------
  Normal   RELOAD     56m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     54m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     48m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   Pulled     47m (x338 over 21h)     kubelet                   Container image "registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143" already present on machine
  Normal   RELOAD     47m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     41m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     40m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     33m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     32m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     26m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     25m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     18m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     17m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  Unhealthy  17m (x2417 over 21h)    kubelet                   Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   RELOAD     11m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     10m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  BackOff    7m20s (x4235 over 21h)  kubelet                   Back-off restarting failed container
  Normal   RELOAD     4m10s                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     3m6s                    nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  Unhealthy  2m19s (x1749 over 21h)  kubelet                   Liveness probe failed: HTTP probe failed with statuscode: 500


Name:         ingress-nginx-controller-vm52x
Namespace:    ingress-nginx
Priority:     0
Node:         ip-10-0-3-198.[region removed].compute.internal/10.0.3.198
Start Time:   Thu, 27 Oct 2022 23:40:56 +0000
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              controller-revision-hash=6ffbc58587
              pod-template-generation=1
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Running
IP:           10.0.3.119
IPs:
  IP:           10.0.3.119
Controlled By:  DaemonSet/ingress-nginx-controller
Containers:
  controller:
    Container ID:  docker://1d9ced8e9c230d14cb4dd915f4d7a134006dd75dd169ef490e929ea30a4c3ae1
    Image:         registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143
    Image ID:      docker-pullable://registry.k8s.io/ingress-nginx/controller@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143
    Ports:         80/TCP, 80/TCP, 2443/TCP, 8443/TCP, 32443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-controller-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 28 Oct 2022 20:55:48 +0000
      Finished:     Fri, 28 Oct 2022 20:56:57 +0000
    Ready:          False
    Restart Count:  350
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-vm52x (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gxqdn (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-gxqdn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                     From                      Message
  ----     ------     ----                    ----                      -------
  Normal   RELOAD     55m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     54m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     48m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     46m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     40m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     39m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     33m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     32m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     25m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     24m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     18m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  Unhealthy  17m (x2425 over 21h)    kubelet                   Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   RELOAD     17m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     10m                     nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     9m56s                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  BackOff    7m26s (x4227 over 21h)  kubelet                   Back-off restarting failed container
  Normal   RELOAD     3m39s                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     2m36s                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  Unhealthy  2m19s (x1751 over 21h)  kubelet                   Liveness probe failed: HTTP probe failed with statuscode: 500


Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.4.0
                          helm.sh/chart=ingress-nginx-4.3.0
Annotations:              meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
                          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
                          service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 3600
                          service.beta.kubernetes.io/aws-load-balancer-ssl-cert:
                            arn:aws-us-gov:acm:[region removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
                          service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
                          service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx
                          service.beta.kubernetes.io/aws-load-balancer-type: nlb
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.20.157.170
IPs:                      172.20.157.170
LoadBalancer Ingress:     a3ebaf5eb07b24c08b1118a33ca6ebfd-1a8a5b6538b5f2a3.elb.[region removed].amazonaws.com
Port:                     http  80/TCP
TargetPort:               tohttps/TCP
NodePort:                 http  32057/TCP
Endpoints:                
Port:                     https  443/TCP
TargetPort:               http/TCP
NodePort:                 https  30263/TCP
Endpoints:                
Port:                     32443-tcp  32443/TCP
TargetPort:               32443-tcp/TCP
NodePort:                 32443-tcp  30514/TCP
Endpoints:                
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>


Name:              ingress-nginx-controller-admission
Namespace:         ingress-nginx
Labels:            app.kubernetes.io/component=controller
                   app.kubernetes.io/instance=ingress-nginx
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=ingress-nginx
                   app.kubernetes.io/part-of=ingress-nginx
                   app.kubernetes.io/version=1.4.0
                   helm.sh/chart=ingress-nginx-4.3.0
Annotations:       meta.helm.sh/release-name: ingress-nginx
                   meta.helm.sh/release-namespace: ingress-nginx
Selector:          app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                172.20.52.146
IPs:               172.20.52.146
Port:              https-webhook  443/TCP
TargetPort:        webhook/TCP
Endpoints:         
Session Affinity:  None
Events:            <none>


Name:           ingress-nginx-controller
Selector:       app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Node-Selector:  kubernetes.io/os=linux
Labels:         app.kubernetes.io/component=controller
                app.kubernetes.io/instance=ingress-nginx
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=ingress-nginx
                app.kubernetes.io/part-of=ingress-nginx
                app.kubernetes.io/version=1.4.0
                helm.sh/chart=ingress-nginx-4.3.0
Annotations:    deprecated.daemonset.template.generation: 1
                meta.helm.sh/release-name: ingress-nginx
                meta.helm.sh/release-namespace: ingress-nginx
Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/component=controller
                    app.kubernetes.io/instance=ingress-nginx
                    app.kubernetes.io/name=ingress-nginx
  Service Account:  ingress-nginx
  Containers:
   controller:
    Image:       registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143
    Ports:       80/TCP, 80/TCP, 2443/TCP, 8443/TCP, 32443/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-controller-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:        (v1:metadata.name)
      POD_NAMESPACE:   (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
  Volumes:
   webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
Events:          <none>

Describe Svc

Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.4.0
                          helm.sh/chart=ingress-nginx-4.3.0
Annotations:              meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
                          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
                          service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 3600
                          service.beta.kubernetes.io/aws-load-balancer-ssl-cert:
                            arn:aws-us-gov:acm:[region removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
                          service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
                          service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx
                          service.beta.kubernetes.io/aws-load-balancer-type: nlb
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.20.157.170
IPs:                      172.20.157.170
LoadBalancer Ingress:     a3ebaf5eb07b24c08b1118a33ca6ebfd-1a8a5b6538b5f2a3.elb.[region removed].amazonaws.com
Port:                     http  80/TCP
TargetPort:               tohttps/TCP
NodePort:                 http  32057/TCP
Endpoints:                
Port:                     https  443/TCP
TargetPort:               http/TCP
NodePort:                 https  30263/TCP
Endpoints:                
Port:                     32443-tcp  32443/TCP
TargetPort:               32443-tcp/TCP
NodePort:                 32443-tcp  30514/TCP
Endpoints:                
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>


Name:              ingress-nginx-controller-admission
Namespace:         ingress-nginx
Labels:            app.kubernetes.io/component=controller
                   app.kubernetes.io/instance=ingress-nginx
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=ingress-nginx
                   app.kubernetes.io/part-of=ingress-nginx
                   app.kubernetes.io/version=1.4.0
                   helm.sh/chart=ingress-nginx-4.3.0
Annotations:       meta.helm.sh/release-name: ingress-nginx
                   meta.helm.sh/release-namespace: ingress-nginx
Selector:          app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                172.20.52.146
IPs:               172.20.52.146
Port:              https-webhook  443/TCP
TargetPort:        webhook/TCP
Endpoints:         
Session Affinity:  None
Events:            <none>

How to reproduce this issue:

This is only happening in 1 of 3 environments that, as far as I can tell are the same. The environment itself is somehow causing this issue. This happens with every helm install I've done in this environment.

Additional Information:

The chart is successfully creating all of the NLB pieces, the only piece that's failing is the IC pods.
The permissions on the /etc/resolv.conf are different in this pod then they are in the pod in the other environements this has worked correctly in:

-rw-r-----    1 root     root           141 Oct 28 21:28 /etc/resolv.conf

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-10-28T21:13:29Z

@PhilipBehrenberg: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

longwuyuan · 2022-10-29T00:47:02Z

Does this work in its default install https://kubernetes.github.io/ingress-nginx/deploy/#aws
Do you have the required ports open and allowing connections

/remove-kind bug
/kind support

PhilipBehrenberg · 2022-10-29T18:52:19Z

Does this work in its default install https://kubernetes.github.io/ingress-nginx/deploy/#aws

The default install has the exact same issue. By default install I assume you mean the one that is to just run the following command:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/aws/deploy.yaml

Do you have the required ports open and allowing connections

I have a bunch of other pods running without any issue. They are able to communicate with each other, even across separate nodes in different AZ. The DNS is also working as expected.

longwuyuan · 2022-10-30T01:38:32Z

Then please post the error messages and other logs. Thanks.

PhilipBehrenberg · 2022-10-30T18:31:05Z

All of my logs and everything are in the original post. Running the default install failed in the exact same way, same error everything.

strongjz · 2022-11-11T02:56:26Z

/assign @strongjz

sorind-broadsign · 2022-12-01T18:24:58Z

I have the same issue.
Any idea on what could cause this ?

For me, it starts failing on appVersion: 1.2.1 and chart version: 4.1.3

bcessa · 2022-12-05T15:59:47Z

In case this is helpful to someone else. I was having this exact issue with DOKS. In my case the reason was using hostNetwork: true.

This was causing the health checks to fail due to a missing node address. Specifically, the value of controller.healthCheckHost:

Address to bind the health check endpoint. It is better to set this option to the internal node address if the ingress nginx controller is running in the hostNetwork: true mode.

Simply turning it off with hostNetwork: false (the default value) solved the issue for me.

app version = 1.5.1
chart = ingress-nginx-4.4.0

longwuyuan · 2024-09-08T09:27:06Z

This is 2 years old and lots of users are currently using the controller on EKS without this failure.
The error message posted was related to networking and specifically port 10246. That is a unfamiliar port number.

In any case the version of the controller reported is not supported anymore and AWS now requires the install of the AWS LoadBalancer Controller along with AWS specific annotations set during the install.

This issue is adding to the open issues count without a action item so I will close it for now. Please use the latest release of the controller as documented in the Deployment docs and ensure to open required ports and match the standard OS config as required by K8S. Then post all the info asked in the issue description by editing out th eold info and pasting the new test info.Then reopen the issue if you are still tracking this. Else it can remain closed. It will help us reduce the count of real issues being tracked with action items. thnaks

/close

k8s-ci-robot · 2024-09-08T09:27:11Z

@longwuyuan: Closing this issue.

In response to this:

This is 2 years old and lots of users are currently using the controller on EKS without this failure.
The error message posted was related to networking and specifically port 10246. That is a unfamiliar port number.

In any case the version of the controller reported is not supported anymore and AWS now requires the install of the AWS LoadBalancer Controller along with AWS specific annotations set during the install.

This issue is adding to the open issues count without a action item so I will close it for now. Please use the latest release of the controller as documented in the Deployment docs and ensure to open required ports and match the standard OS config as required by K8S. Then post all the info asked in the issue description by editing out th eold info and pasting the new test info.Then reopen the issue if you are still tracking this. Else it can remain closed. It will help us reduce the count of real issues being tracked with action items. thnaks

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

PhilipBehrenberg added the kind/bug Categorizes issue or PR as related to a bug. label Oct 28, 2022

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 28, 2022

k8s-ci-robot added the needs-priority label Oct 28, 2022

k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 29, 2022

k8s-ci-robot assigned strongjz Nov 11, 2022

strongjz added this to [SIG Network] Ingress NGINX Feb 16, 2023

strongjz moved this to Todo in [SIG Network] Ingress NGINX Feb 16, 2023

k8s-ci-robot closed this as completed Sep 8, 2024

github-project-automation bot moved this from Todo to Done in [SIG Network] Ingress NGINX Sep 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingress Nginx Ingress Controller fails to start on EKS Node #9230

Ingress Nginx Ingress Controller fails to start on EKS Node #9230

PhilipBehrenberg commented Oct 28, 2022 •

edited

Loading

k8s-ci-robot commented Oct 28, 2022

longwuyuan commented Oct 29, 2022

PhilipBehrenberg commented Oct 29, 2022 •

edited

Loading

longwuyuan commented Oct 30, 2022

PhilipBehrenberg commented Oct 30, 2022

strongjz commented Nov 11, 2022

sorind-broadsign commented Dec 1, 2022 •

edited

Loading

bcessa commented Dec 5, 2022 •

edited

Loading

longwuyuan commented Sep 8, 2024

k8s-ci-robot commented Sep 8, 2024

Ingress Nginx Ingress Controller fails to start on EKS Node #9230

Ingress Nginx Ingress Controller fails to start on EKS Node #9230

Comments

PhilipBehrenberg commented Oct 28, 2022 • edited Loading

k8s-ci-robot commented Oct 28, 2022

longwuyuan commented Oct 29, 2022

PhilipBehrenberg commented Oct 29, 2022 • edited Loading

longwuyuan commented Oct 30, 2022

PhilipBehrenberg commented Oct 30, 2022

strongjz commented Nov 11, 2022

sorind-broadsign commented Dec 1, 2022 • edited Loading

bcessa commented Dec 5, 2022 • edited Loading

longwuyuan commented Sep 8, 2024

k8s-ci-robot commented Sep 8, 2024

PhilipBehrenberg commented Oct 28, 2022 •

edited

Loading

PhilipBehrenberg commented Oct 29, 2022 •

edited

Loading

sorind-broadsign commented Dec 1, 2022 •

edited

Loading

bcessa commented Dec 5, 2022 •

edited

Loading