-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress-nginx-controller v1.8.1 version will cause intermittent network requests to get stuck #10276
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Have you tried testing the network in your cluster first? For example, without ingress-nginx |
/remove-kind bug |
Yes, very sure, accessing services except ingress-nginx is very normal. There is no such network stuck situation. Test Results:
|
@tony-liuliu There is no answers to the questions asked in a issue template so everything you are saying here assumes that your clster and environment is 100% perfect in acceptable state. It also assumes that your installation of the ingress-nginx controller is 100% perfect. That does not work when a deep dive is required. Please provide the details as asked in a new issue tmplate. |
Have the same issue, latest helm chart. Everything else working beside ingress-nginx. Works sometimes, sometimes holds the connection open and nothing happens. Will retort with the issue template questions later in the day. |
any logs? |
After my test today, I found that the reason why the nginx-controller network is intermittently stuck may be related to this: The CPU configuration of the kvm virtual machine node running on the nginx-controller pod is 16 cores. I checked that the default configuration of worker-processes is auto. Normally, 16 worker processes are created, but only 13 are created here. For example, when the default value of worker-processes is auto(16), the nginx-controller network will be stuck intermittently,Because after testing, I found that only 13 worker processes were actually created, which may be the main cause of the problem.
When I manually adjust worker-processes to 13 or less, the network requests of worker-processes will be normal:
I tried to constantly adjust the value of worker-processes and found that as long as the value of worker-processes is consistent with the actual created worker process, there is no problem of intermittent network stuck. |
This comment was marked as outdated.
This comment was marked as outdated.
Can confirm. My config had worker-processes at 16, but the container only had 8. By fixing the setting the issue goes away.
|
If your issue can be solved by adjusting worker-processes, then you need to consider issues such as load, network card, interruption, etc. |
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach |
The same here.
I had 13 workers exactly as mention above. my setup: |
we are still hitting this, not sure why, but strangely the intermittent failures NEVER happen when we set a single replica count of |
Issue was solved so closing. /close |
@longwuyuan: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@longwuyuan sorry, how it comes it was solved? can you please point us to the PR fixing this? Thank you! 💯 |
Adjusting workers as mentioned here #10276 (comment) If it is not so, then kindly re-open the issue after posting the information that can be analyzed. Please use a kind cluster to reproduce the issue. Please use helm to install the controller and please provide the values file used to install the controller. You can also fork the project, create a branch and clone the branch locally. Then from root of local clone, you can run |
Problem phenomenon:
After deploying the latest ingress-nginx-controller, requests to port 80 or 443 of the nginx-controller pod IP address will always be stuck, even if you enter the ingress-nginx-controller container and use curl 127.0.0.1, it will also get stuck Phenomenon, please help me to find out what the problem is.
All requests for non-ingress-nginx-controller services are running normally, including the health check port 10254 of the ingress-nginx-controller service.
Environmental information:
kubernetes version: 1.27.4
OS: CentOS : CentOS Linux release 7.9.2009 (Core)
Linux kernel: Linux dong-k8s-90 4.20.13-1.el7.elrepo.x86_64 #1 SMP Wed Feb 27 10:02:05 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
runtime: containerd://1.7.2
Install tools:
CNI: calico-3.26.1 using IPIP mode, Deployment manifest used https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
How was the ingress-nginx-controller installed:
ingress-nginx-controller version: v1.8.1 Deployment manifest used https://github.com/kubernetes/ingress-nginx/blob/main/deploy/static/provider/baremetal/deploy.yaml
Current State of the controller:
The following is the packet capture information when something goes wrong:
The client initiates a curl request
It has been stuck in this state and has not returned.
ps: Because the pod has been restarted, the IP address seen has changed and the information captured is different.
The request packet captured by the client
ingress-nginx-controller container network capture
It will cause the client to be stuck all the time. This frequency is very high Please help me to find out what is causing the problem.
The text was updated successfully, but these errors were encountered: