-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update from RKE v1.2.11 to v1.3.1 leads to unreachable Rancher UI #2702
Comments
You are talking about an upgrade but the steps to reproduce show a clean install. But I am more interested in this part |
What I want to achieve is a Rancher installation that is reachable via my loadbalancer. As it was with the previous rke version and Rancher 2.5. And if it works then switch to a multi node setup. Sorry if I didn't described my case clearly. I added the ingress network part because of this text in the linked documentation:
I first tried it woithout this addition but that didn't work as well (It always ended in a 502 Bad Gateway error when I curled the rancher server over the loadbalancer). So after some searching I found the mentioned documentation and added the ingress network part. |
There is no change needed, your change made the ingress controller not be exposed on the outside causing the connection refused logging in your NGINX load balancer. We will need the logs from the setup without this setting, and outputs from https://rancher.com/docs/rancher/v2.6/en/troubleshooting/rancherha/, and from the |
Ok, I redeployed the Cluster without the changes. Here are the requested logs and outputs. Current cluster.conf
curl -k -vvv https://rancheradm.internal.de
The corresponding log entry from the nginx load balancer
Bypassing the load balancer with
Rancher log Ingress log
Leader Election
I hope this helps. I see that there is something wrong but I can not make much sense of these log outputs. |
I am running into this exact same issue. Upgrading from Downgrading brings back the UI. This is what the network and ingress sections look like in the cluster.yml file: network:
plugin: calico
ingress:
provider: nginx
options:
use-forwarded-headers: "true" |
We'll need the same outputs and logging requested to cross reference, also both please run the networking and DNS tests from https://rancher.com/docs/rancher/v2.6/en/troubleshooting/ to make sure the basics are working. |
@superseb , sorry I had to rollback. I might try again later in the week and will collect logs. Briefly, this is what works: rke version 1.2.11, kubernetes 1.20.9
rke version 1.3.1, kubernetes 1.20.11 This does not work: rke version 1.3.1, kubernetes 1.21.5 I also had ingress issues when I upgraded rancher (via helm) to kubernetes I have a feeling it could be related to kubernetes/ingress-nginx#7510. |
Overlaytest MTU settings on nodes
I can not check them on the network devices atm. but I will try to get a hold of the network admin. DNS kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system get svc -l k8s-app=kube-dns
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default - If you don't see a command prompt, try pressing enter.
- nslookup: can't resolve 'kubernetes.default'
- pod "busybox" deleted
- pod default/busybox terminated (Error) kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup www.google.com - If you don't see a command prompt, try pressing enter.
- nslookup: can't resolve 'www.google.com'
- pod "busybox" deleted
- pod default/busybox terminated (Error) DNS resolution on host system works fine though. DNS resolve test on all nodes with ds-dnstest.yml Core DNS specific kubectl -n kube-system logs -l k8s-app=kube-dns
kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
CoreDNS resolv.conf CoreDNS log file after enabling query logging
Again I really appreciate the help. As it seems CoreDNS seems to have an issue. Any ideas what this may be? |
What OS are you running, can you share |
Hi, I'm running on SLES 15 SP3. Sorry I completely forgot about docker info in my first post. Docker info
Ping to the outside is not possible due to corporate proxy. But using |
And I assume manually querying the DNS server for a DNS record does not work for any pod (but does work on host)? |
Yes. Nslookup on host side works but not for pods. Ah and I forgot, these are the ports I opened in my firewall according to the documentation:
|
I'd have to run some more checks but if the pods can't talk to your DNS server, it has to either CNI related (updated CNI between 1.20 and 1.21), so please check/supply logs from CNI pods or something with the host firewall but that is more unlikely as its the same hosts being used. |
Hmm ok, but the pods have no problem to ping my dns servers ip addresses or the ip's of the other nodes. Just pinging outside of our company network is not possible due to our corporate proxy. Calico-kube-controllers log
kubectl -n kube-system logs -f canal-z8tht calico-node
kubectl -n kube-system logs -f canal-z8tht kube-flannel
kubectl -n kube-system logs -f canal-z8tht install-cni
kubectl -n kube-system logs -f canal-z8tht flexvol-driver kubectl -n kube-system logs -f rke-network-plugin-deploy-job-jfxvw
If you can think of anything else you may need please tell me. These were all the logs I could think of. |
Ok, I just looked around and found this issue. After that I tried to disable firewalld and the dns resolution for pods works. It seems there was a change either in calico or firewalld that is causing this problem. Edit:
I will run the above mentioned checks regarding rancher HA deployment again tomorrow. and see if I can spot anything. |
Ok I did the Rancher HA checks again without firewalld on any node and there are just some logs that seem strange to me.
The corresponding curl output
Pod details
*Pod Container logs
Namespace Events
Ingress log
kubectl -n cattle-system describe ingress
The only thing I can see is that there is still some DNS issue acording to |
The path is client -> load balancer -> host's ingress controller -> pod. As before, you can bypass the load balancer to rule that out and connect to the host's ingress controller directly. Tailing that ingress controller log of the host that you are connecting to should give you info on what happens when the connection is made. You can also raise the logging verbosity on the ingress controller, first step is to see if you can properly connect to the host's ingress controller. If that works, you can check wether the ingress controller can connect to any of the rancher pods. I assume they are all active and ready. Because of the addition of the admission webhook to the new ingress nginx controller, the network mode was changed from hostNetwork to hostPort. This might also give some problems (although it shouldn't, especially with everything disabled firewall wise), but last resort would be to force the ingress controller to the old mode (although this exposes the admission webhook to the outside which is why we changed it to hostPort)
|
So after checking the ingress connection from different angles without a positive result I redeployed the cluster with RKE. And after that it worked with a disabled firewall. I am able to connect to the previously deployed rancher server. After using the fix described in this comment, I am now able to use rancher together with firewalld on all nodes. Thank you very much for your help and patience. From my side this issue can be closed. It seems that it was a problem between calico/canal and firewalld. |
Thanks for letting us know. @samstride Please file a new issue with the requested info if you are still experiencing issues. |
@superseb , setting |
What is the attack vector here? How can it be exploited? Should I worry if I change it back to the previous setting? |
that is a breaking change and wasn't announced properly. also hostport seems to be not configured correctly. it only forwards 127.0.0.1 and internal addresses but doesn't do so for everything else. |
I recently tried to update to Rancher version 2.6 and since I was at it I also updated rke from version 1.2.11 to version 1.3.1. I'm running a single node setup on a Bare-metal server with self signed certificates behind a layer 4 load balancer (multinode setup planned in the future). After the update I read in this Documentation for the configuration of the network options that I need to add
network_mode: none
to my ingress configuration since there was a change in Kubernetes. I did this as seen in my cluster configuration below. Nevertheless after deploying the cluster, creating the namespace, adding the certificates and deploying rancher as shown in the steps below I'm unable to curl or browse the Ranger UI. Even thoughkubectl -n cattle-system rollout status deploy/rancher
tells me the deployment was successfull andkubectl -n cattle-system get pods
shows that it is running.I tried to gather as much information as possible including the ingress logs, the rancher logs and the log output from my load balancer when trying to curl the rancher server.
All I can see is that there seems to be no endpoint reachable. The configuration (without the ingress addition) worked fine for rke 1.2.11 and Rancher 2.5. I have no idea where I should look next or what might be the cause of the issue. So every help in that direction is much appreciated.
RKE version:
Docker version: (
docker version
)Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)os-release:
uname -r:
5.3.18-59.19-default
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Bare-metal
cluster.yml file:
Steps to Reproduce:
kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=/opt/rancher_inst/ssl_rancheradm.internal.de/tls.crt --key=/opt/rancher_inst/ssl_rancheradm.internal.de/tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=/opt/rancher_inst/lff_root_ca/cacerts.pem
helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=rancheradm.internal.de --set ingress.tls.source=secret --set privateCA=true --set bootstrapPassword=startPass --set replicas=3 --set proxy=http://www.proxy.internal.de:80 --set noProxy=127.0.0.1\\,localhost\\,0.0.0.0\\,10.0.0.0/8\\,cattle-system.svc\\,.svc\\,.cluster.local\\,.internal.de
Results:
Ingress log
Log output from
kubectl -n cattle-system logs -f rancher-75b8bc6df6-k8vcs
rancher_log.txt
Log output from nginx load balancer
The text was updated successfully, but these errors were encountered: