Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system #2730

Closed
PierreBrisorgueil opened this issue Oct 29, 2021 · 6 comments

Comments

@PierreBrisorgueil
Copy link

PierreBrisorgueil commented Oct 29, 2021

RKE version:

rke version v1.3.1

Docker version: (docker version,docker info preferred)

Client: Docker Engine - Community
 Version:           20.10.5
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        55c4c88
 Built:             Tue Mar  2 20:17:50 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       5eb3275d40
  Built:            Tue Dec  1 19:18:50 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)

Dedicated server

cluster.yml file:

nodes:
- address: xx.xx.xx.xx
  port: "xx"
  internal_address: ""
  role:
  - controlplane
  - worker
  - etcd
  hostname_override: ""
  user: xxxx
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: xx.xx.xx.xx
  port: "xx"
  internal_address: ""
  role:
  - controlplane
  - worker
  - etcd
  hostname_override: ""
  user: xxxx
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    uid: 0
    gid: 0
    snapshot: null
    retention: ""
    creation: ""
    backup_config:
      enabled: true     # enables recurring etcd snapshots
      interval_hours: 12 # time increment between snapshots
      retention: 50     # time in days before snapshot purge
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
    secrets_encryption_config: null
    audit_log: null
    admission_configuration: null
    event_rate_limit: null
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
  kubelet:
    image: ""
    extra_args: {}
    extra_binds:
      - "/mnt/rancher:/mnt/rancher"
    extra_env: []
    cluster_domain: cluster.local
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
    generate_serving_certificate: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
network:
  plugin: canal
  options: {}
  mtu: 0
  node_selector: {}
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: rancher/coreos-etcd:v3.4.3-rancher1
  alpine: rancher/rke-tools:v0.1.56
  nginx_proxy: rancher/rke-tools:v0.1.56
  cert_downloader: rancher/rke-tools:v0.1.56
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.56
  kubedns: rancher/k8s-dns-kube-dns:1.15.0
  dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.0
  kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.0
  kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  coredns: rancher/coredns-coredns:1.6.5
  coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  nodelocal: rancher/k8s-dns-node-cache:1.15.7
  kubernetes: rancher/hyperkube:v1.17.5-rancher1
  flannel: rancher/coreos-flannel:v0.11.0-rancher1
  flannel_cni: rancher/flannel-cni:v0.3.0-rancher5
  calico_node: rancher/calico-node:v3.13.0
  calico_cni: rancher/calico-cni:v3.13.0
  calico_controllers: rancher/calico-kube-controllers:v3.13.0
  calico_ctl: rancher/calico-ctl:v2.0.0
  calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.0
  canal_node: rancher/calico-node:v3.13.0
  canal_cni: rancher/calico-cni:v3.13.0
  canal_flannel: rancher/coreos-flannel:v0.11.0
  canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.0
  weave_node: weaveworks/weave-kube:2.5.2
  weave_cni: weaveworks/weave-npc:2.5.2
  pod_infra_container: rancher/pause:3.1
  ingress: rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
  ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  metrics_server: rancher/metrics-server:v0.3.6
  windows_pod_infra_container: rancher/kubelet-pause:v0.1.3
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: false
kubernetes_version: ""
private_registries:
  - url: docker.io
    user: xxxxxx
    password: xxxxxxxxxxxxx
ingress:
  provider: ""
  options: {}
  node_selector: {}
  extra_args: {}
  dns_policy: ""
  extra_envs: []
  extra_volumes: []
  extra_volume_mounts: []
cluster_name: ""
cloud_provider:
  name: ""
prefix_path: ""
addon_job_timeout: 0
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
monitoring:
  provider: ""
  options: {}
  node_selector: {}
restore:
  restore: false
  snapshot_name: ""
dns: null

Steps to Reproduce:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:35:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

rke up with version v1.3.1

FATA[0120] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

kubectl logs -l rke-network-plugin-deploy-job -n kube-system

nothing

kubectl get pods --all-namespaces

rke-network-plugin-deploy-job seems not present

** Description **

I tried to update the cluster, rke up, and I get stuck on this error Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system. I have already checked the issues of the same style and I get stuck. Two launching of the rke up changes nothing. 🤔
I also tried to "add addon_job_timeout: 60" in cluster.yml

@superseb
Copy link
Contributor

Use of system_images has been discouraged for a long time, see https://rancher.com/docs/rke/latest/en/upgrades/.

I think it also helps if you specify the previous RKE version used, as the k8s version is pretty old.

@PierreBrisorgueil
Copy link
Author

PierreBrisorgueil commented Oct 29, 2021

Thanks @superseb for the time! It's been a while since I looked into the subject, I'm going to follow the updates more regularly.

  1. Do I just have to remove the system_image conf to let kubernetes_version take over?
  2. Do you suggest upgrading in several steps?

hum, not sure about the previous RKE version, it was from another computer ..

i would say in 2020 summer, around 1.1

@superseb
Copy link
Contributor

Yes, kubernetes_version is the only thing you need. Ehm in case of this old, I would say yes but not having any logs in the job is not making it easier to debug. Also kinda depends if you made an etcd snapshot before upgrading so you can go back when it doesn't work.

@PierreBrisorgueil
Copy link
Author

PierreBrisorgueil commented Oct 29, 2021

@superseb Thank you ! the upgrade seems to have succeeded without the definition of system_image 🙏

Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:04:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

all pods are up, on the other hand, the rancher UI is 502 Bad Gateway or randomly 504 Gateway Time-out

#2702 with network_mode: hostNetwork does not seem to solve the problem on my end, an idea ?

@PierreBrisorgueil
Copy link
Author

PierreBrisorgueil commented Oct 29, 2021

kubectl -n cattle-system get pods

NAME                               READY   STATUS    RESTARTS   AGE
rancher-76cf4684c7-jbz8g           1/1     Running   6          57d
rancher-76cf4684c7-q4vtb           1/1     Running   5          57d
rancher-76cf4684c7-rbgqc           1/1     Running   6          57d
rancher-webhook-56cf7b8669-8rffp   1/1     Running   1          57d

kubectl -n cattle-system describe certificate

Name:         tls-rancher-ingress
Namespace:    cattle-system
Labels:       app=rancher
              app.kubernetes.io/managed-by=Helm
              chart=rancher-2.6.0
              heritage=Helm
              release=rancher
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2021-10-25T07:16:12Z
  Generation:          1
  Owner References:
    API Version:           networking.k8s.io/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Ingress
    Name:                  rancher
    UID:                   70d08056-c368-45f2-a6c2-bf07c7c7c7ee
  Resource Version:        237743616
  UID:                     79098a42-ee68-4705-b97e-d1123611c3f5
Spec:
  Dns Names:
    xxxxxxx
  Issuer Ref:
    Group:      cert-manager.io
    Kind:       Issuer
    Name:       letsencrypt-prod
  Secret Name:  tls-rancher-ingress
  Usages:
    digital signature
    key encipherment
Status:
  Conditions:
    Last Transition Time:  2021-10-25T07:16:13Z
    Message:               Certificate is up to date and has not expired
    Observed Generation:   1
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2021-12-28T07:06:28Z
  Not Before:              2021-09-29T07:06:29Z
  Renewal Time:            2021-11-28T07:06:28Z
Events:                    <none>

kubectl get pods --all-namespaces

NAMESPACE                  NAME                                                      READY   STATUS             RESTARTS   AGE
cattle-fleet-system        fleet-agent-bc4877788-f6p5s                               1/1     Running            2          57d
cattle-fleet-system        fleet-controller-756df96fd-xfc22                          1/1     Running            3          57d
cattle-fleet-system        gitjob-56fb9cfdd8-6svdr                                   1/1     Running            2          57d
cattle-monitoring-system   alertmanager-rancher-monitoring-alertmanager-0            2/2     Running            2          57d
cattle-monitoring-system   prometheus-rancher-monitoring-prometheus-0                4/4     Running            4          57d
cattle-monitoring-system   pushprox-kube-controller-manager-client-mdwcl             1/1     Running            9          214d
cattle-monitoring-system   pushprox-kube-controller-manager-client-sq49c             1/1     Running            1          214d
cattle-monitoring-system   pushprox-kube-controller-manager-proxy-56f7987f9f-gcwpd   1/1     Running            3          59d
cattle-monitoring-system   pushprox-kube-etcd-client-jp96k                           1/1     Running            9          214d
cattle-monitoring-system   pushprox-kube-etcd-client-tlxrx                           1/1     Running            1          214d
cattle-monitoring-system   pushprox-kube-etcd-proxy-84b67fdc55-c6f67                 1/1     Running            3          59d
cattle-monitoring-system   pushprox-kube-proxy-client-qzrhf                          1/1     Running            9          214d
cattle-monitoring-system   pushprox-kube-proxy-client-wtzws                          1/1     Running            1          214d
cattle-monitoring-system   pushprox-kube-proxy-proxy-78d7f45847-vlck2                1/1     Running            3          59d
cattle-monitoring-system   pushprox-kube-scheduler-client-g9hxl                      1/1     Running            1          214d
cattle-monitoring-system   pushprox-kube-scheduler-client-h9sgj                      1/1     Running            9          214d
cattle-monitoring-system   pushprox-kube-scheduler-proxy-57b6cf4bfc-9l224            1/1     Running            3          59d
cattle-monitoring-system   rancher-monitoring-grafana-5f6fc9c67b-2gtjl               3/3     Running            10         59d
cattle-monitoring-system   rancher-monitoring-kube-state-metrics-5c755d5d-ktrz7      1/1     Running            6          59d
cattle-monitoring-system   rancher-monitoring-operator-f87788c5-bfk56                2/2     Running            6          59d
cattle-monitoring-system   rancher-monitoring-prometheus-adapter-666fd8d955-ppqt6    1/1     Running            43         59d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter-l9x4t         1/1     Running            17         214d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter-sl7k8         1/1     Running            1          214d
cattle-system              rancher-76cf4684c7-jbz8g                                  1/1     Running            6          57d
cattle-system              rancher-76cf4684c7-q4vtb                                  1/1     Running            5          57d
cattle-system              rancher-76cf4684c7-rbgqc                                  1/1     Running            6          57d
cattle-system              rancher-webhook-56cf7b8669-8rffp                          1/1     Running            1          57d
cert-manager               cert-manager-669f48df7f-7qbr6                             1/1     Running            2          4d12h
cert-manager               cert-manager-cainjector-587649f6db-bcgbn                  1/1     Running            3          4d12h
cert-manager               cert-manager-webhook-69645d95cf-8x2pm                     1/1     Running            1          4d12h
default                    mariadb-0                                                 1/1     Running            1          57d
default                    mongodb-replicaset-0                                      1/1     Running            1          57d
default                    mongodb-replicaset-1                                      1/1     Running            1          57d
default                    mongodb-replicaset-2                                      1/1     Running            2          57d
default                    mongodb-replicaset-3                                      1/1     Running            1          57d
ingress-nginx              nginx-ingress-controller-7fbkp                            1/1     Running            0          5h7m
ingress-nginx              nginx-ingress-controller-qfjpw                            1/1     Running            0          5h8m
kube-system                calico-kube-controllers-55d84c6548-xtlft                  1/1     Running            1          106m
kube-system                canal-6wfx9                                               2/2     Running            0          105m
kube-system                canal-hprkl                                               2/2     Running            0          106m
kube-system                coredns-567c9cd8fb-fcff2                                  1/1     Running            0          106m
kube-system                coredns-567c9cd8fb-jmxf8                                  1/1     Running            0          7h36m
kube-system                coredns-autoscaler-6cd44c94ff-mdkpm                       1/1     Running            0          7h36m
kube-system                metrics-server-7bf4b68b78-4ml54                           1/1     Running            2          7h35m
kube-system                rke-coredns-addon-deploy-job-vg4fl                        0/1     Completed          0          7h36m
kube-system                rke-ingress-controller-deploy-job-gxx4j                   0/1     Completed          0          5h9m
kube-system                rke-metrics-addon-deploy-job-zwfxj                        0/1     Completed          0          7h36m
kube-system                rke-network-plugin-deploy-job-t6bfj                       0/1     Completed          0          106m

kubectl -n ingress-nginx describe pod

Name:         nginx-ingress-controller-7fbkp
Namespace:    ingress-nginx
Priority:     0
Node:         xx.xx.xx.xx/xx.xx.xx.xx
Start Time:   Fri, 29 Oct 2021 16:20:35 +0200
Labels:       app=ingress-nginx
              app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/version=0.48.1
              controller-revision-hash=586775fdf6
              pod-template-generation=6
Annotations:  cni.projectcalico.org/containerID: 5d6c9b2f667eecf3cb586b47c4724992acda7635abb2daa2f86d191955bc5b1d
              cni.projectcalico.org/podIP: 10.42.2.127/32
              cni.projectcalico.org/podIPs: 10.42.2.127/32
Status:       Running
IP:           10.42.2.127
IPs:
  IP:           10.42.2.127
Controlled By:  DaemonSet/nginx-ingress-controller
Containers:
  nginx-ingress-controller:
    Container ID:  docker://91494e0aa043be1689b51b80040ab4b6f4757ab4033507ee876c2e5c74822ca7
    Image:         rancher/nginx-ingress-controller:nginx-0.48.1-rancher1
    Image ID:      docker-pullable://rancher/nginx-ingress-controller@sha256:b34ba096ab65cc410e4967477326ce4178ce8c37a7bebdb2e262630665a24d52
    Ports:         8443/TCP, 80/TCP, 443/TCP
    Host Ports:    8443/TCP, 80/TCP, 443/TCP
    Args:
      /nginx-ingress-controller
      --election-id=ingress-controller-leader
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/nginx-configuration
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Fri, 29 Oct 2021 16:20:37 +0200
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:      http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-controller-7fbkp (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sl74n (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-sl74n:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoExecuteop=Exists
                             :NoScheduleop=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:                      <none>


Name:         nginx-ingress-controller-qfjpw
Namespace:    ingress-nginx
Priority:     0
Node:         xx.xx.xx.xx/xx.xx.xx.xx
Start Time:   Fri, 29 Oct 2021 16:19:50 +0200
Labels:       app=ingress-nginx
              app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/version=0.48.1
              controller-revision-hash=586775fdf6
              pod-template-generation=6
Annotations:  cni.projectcalico.org/containerID: 5875d9af1ea1a585dc3548ee22e5ca9a902f1726a21ea26127c1f50daee68346
              cni.projectcalico.org/podIP: 10.42.0.46/32
              cni.projectcalico.org/podIPs: 10.42.0.46/32
Status:       Running
IP:           10.42.0.46
IPs:
  IP:           10.42.0.46
Controlled By:  DaemonSet/nginx-ingress-controller
Containers:
  nginx-ingress-controller:
    Container ID:  docker://4860a1cd5d14b9d8e69758f10216037e9ac17b82b90b3b544a928faf6906e731
    Image:         rancher/nginx-ingress-controller:nginx-0.48.1-rancher1
    Image ID:      docker-pullable://rancher/nginx-ingress-controller@sha256:b34ba096ab65cc410e4967477326ce4178ce8c37a7bebdb2e262630665a24d52
    Ports:         8443/TCP, 80/TCP, 443/TCP
    Host Ports:    8443/TCP, 80/TCP, 443/TCP
    Args:
      /nginx-ingress-controller
      --election-id=ingress-controller-leader
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/nginx-configuration
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Fri, 29 Oct 2021 16:19:54 +0200
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:      http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-controller-qfjpw (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dbtls (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-dbtls:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoExecuteop=Exists
                             :NoScheduleop=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:                      <none>

kubectl -n ingress-nginx logs -f nginx-ingress-controller-qfjpw

xx.xx.xx.xx - - [29/Oct/2021:19:38:04 +0000] "GET / HTTP/2.0" 499 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36" 19 12.528 [cattle-system-rancher-80] [] 10.42.2.13:80 0 12.528 - 18c5de65a9a8416537e77ccd9220c2e8
2021/10/29 19:38:05 [error] 34#34: *310626 connect() failed (113: Host is unreachable) while connecting to upstream, client: xx.xx.xx.xx, server: xxxxx.com, request: "GET / HTTP/2.0", upstream: "http://10.42.2.18:80/", host: "xxxxx.com"
2021/10/29 19:38:08 [error] 34#34: *310626 connect() failed (113: Host is unreachable) while connecting to upstream, client: xx.xx.xx.xx, server: xxxxx.com, request: "GET / HTTP/2.0", upstream: "http://10.42.2.10:80/", host: "xxxxx.com"

curl -k -vvv https://xxxxx.com

*   Trying xx.xx.xx.xx:443...
* Connected to xxxx.com (xx.xx.xx.xx) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=xxxx.com
*  start date: Sep 29 07:06:29 2021 GMT
*  expire date: Dec 28 07:06:28 2021 GMT
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f8be680f600)
> GET / HTTP/2
> Host: xxxx.com
> user-agent: curl/7.77.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 504 
< date: Fri, 29 Oct 2021 19:45:18 GMT
< content-type: text/html
< content-length: 160
< strict-transport-security: max-age=15724800; includeSubDomains
< 
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host xxxx.com left intact

kubectl -n cattle-system logs -l app=rancher

2021/10/29 19:44:52 [ERROR] Failed to connect to peer wss://10.42.2.10/v3/connect [local ID=10.42.2.18]: dial tcp 10.42.2.10:443: connect: no route to host
W1029 19:44:54.812139      32 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
2021/10/29 19:44:58 [ERROR] Failed to connect to peer wss://10.42.2.10/v3/connect [local ID=10.42.2.18]: dial tcp 10.42.2.10:443: connect: no route to host
W1029 19:45:02.587559      32 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
2021/10/29 19:45:04 [ERROR] Failed to connect to peer wss://10.42.2.10/v3/connect [local ID=10.42.2.18]: dial tcp 10.42.2.10:443: connect: no route to host
W1029 19:45:04.903601      32 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
2021/10/29 19:45:10 [ERROR] Failed to connect to peer wss://10.42.2.10/v3/connect [local ID=10.42.2.18]: dial tcp 10.42.2.10:443: connect: no route to host
``

@PierreBrisorgueil
Copy link
Author

Hey, after other tests,

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT

do the trick

it was linked to rancher/rancher#28840 and the solution was rancher/rancher#28840 (comment)

Thx again for your help @superseb 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants