-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS load balancer health check fails with ingress-nginx version v1.9.5 (but works with up to and including v1.8.4) #10869
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Why is there AWS and Azure annotations in
The controller pod looks healthy Can you reach the service for the controller? The endpoints? Is there a firewall or network policy in place?
We implemented Annotation Validation in 1.9.0 are you using any annotations in the ingress objects? |
@617m4rc Can you please test latest version of the ingress-nginx-controller with the configurations that are mentioned here in these links ;
@strongjz @Gacko wondering if this #9601 is related and we do need a new static yaml for Azure |
@617m4rc We also added some instructions to do some basic testing to get status reported with data here https://kubernetes.github.io/ingress-nginx/troubleshooting/#a-simple-test-of-the-basic-ingress-controller-routing . Please try and see if you can add more info on your use of the latest version of the ingress-nginx-controller using these instructions. |
/assign |
Hello! So, I just created a fresh AKS cluster with the "Production Standard" preset, but without Azure Policy & Monitor. I was able to deploy and successfully request an Ingress NGINX using the following commands and values: helm install --namespace ingress-nginx ingress-nginx https://github.com/kubernetes/ingress-nginx/releases/download/helm-chart-4.9.1/ingress-nginx-4.9.1.tgz --values values.yaml controller:
service:
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz controller:
service:
externalTrafficPolicy: Local Some notes on what's actually happening when setting the above values: By default the Ingress NGINX chart comes with Assuming your node's IP is Internally requests sent to the HTTP node port get forwarded to the Ingress NGINX Controller pod's HTTP port. Since Azure is requesting This is why you're setting For the second values example, things are a bit easier. At least on Azure, the little piece of software backing this health check node port is answering with So at best you set the both of them if you do not want to forward traffic to nodes not running a pod of your workload and want to use the proper health check request path. From your service manifest I can see you are already using The only other differences I can spot between chart version 4.7.3 (the one you're using according to the image version you're mentioning in the title) and 4.9.0 (the one you're using according to As you're actively enabling network policies for admission webhooks, I assume your environment is relying on network policies. In chart version 4.7.3, we had a network policy which was intended for the admission webhook port of the Ingress NGINX Controller: Unfortunately and due to how this was defined, it simply allowed all ingress traffic to the Ingress NGINX Controller pods. Luckily you're enabling this network policy by setting In chart version 4.9.0 this value is not affecting the creation of a now fixed network policy anymore. Using your values and diffing the resulting templates of the both chart version, you can see that this "allow all ingress network policy" is gone. You need to actively set Could you please check my suggestions? As stated before I was able to get everything up and running on AKS, so I guess it's more related to your setup and the actual changes being made between 4.7.3 and 4.9.0. Hope to hear from you |
I'm just wondering why your Azure Load Balancer health checks are failing with the new chart. There's no change related to that and network policies should also not affect them. Could you please set the following values. They are the new ones for what you want to achieve: controller:
networkPolicy:
enabled: true
admissionWebhooks:
patch:
networkPolicy:
enabled: true In the meantime I will further investigate the possible root cause. |
btw, if those values are the ones you usually use, I think you have a typo: topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: XXX-nginx-ingress # <-- Shouldn't this be "XXX-ingress-nginx"
maxSkew: 1
minDomains: 3
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule |
I created another AKS cluster and used the values you provided with few exceptions: % diff *.yaml
41,42c41,42
< extraArgs:
< default-ssl-certificate: management/tls-secret
---
> # extraArgs:
> # default-ssl-certificate: management/tls-secret
57c57
< enabled: true
---
> # enabled: true
70c70
< - XXX-ingress-nginx
---
> - ingress-nginx
90c90
< service.beta.kubernetes.io/azure-dns-label-name: XXX-YYY
---
> # service.beta.kubernetes.io/azure-dns-label-name: XXX-YYY
91a92
> service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
98,105c99,106
< topologySpreadConstraints:
< - labelSelector:
< matchLabels:
< app.kubernetes.io/instance: XXX-nginx-ingress
< maxSkew: 1
< minDomains: 3
< topologyKey: topology.kubernetes.io/zone
< whenUnsatisfiable: DoNotSchedule
---
> # topologySpreadConstraints:
> # - labelSelector:
> # matchLabels:
> # app.kubernetes.io/instance: XXX-nginx-ingress
> # maxSkew: 1
> # minDomains: 3
> # topologyKey: topology.kubernetes.io/zone
> # whenUnsatisfiable: DoNotSchedule I needed to disable the topology spread constraints as my cluster didn't have multiple availability zones and the service monitor as I did not install Prometheus CRDs. I also commented the default SSL certificate and the DNS label name annotations as they do not apply to my setup. Last but not least I added the In the end my Ingress NGINX and the AKS load balancer health checks were working perfectly fine. As stated above, using your values the two chart version only differ by network policies and pod security stuff. The former can be mitigated by setting the new values as mentioned above. With them in place, the only real diff remaining is the PSS stuff. You can compare them on your own using this command: diff <(helm template --namespace ingress-nginx ingress-nginx https://github.com/kubernetes/ingress-nginx/releases/download/helm-chart-4.7.3/ingress-nginx-4.7.3.tgz --values provided.yaml | grep --invert-match --extended-regexp "(app.kubernetes.io/version|helm.sh/chart):") <(helm template --namespace ingress-nginx ingress-nginx https://github.com/kubernetes/ingress-nginx/releases/download/helm-chart-4.9.1/ingress-nginx-4.9.1.tgz --values new.yaml | grep --invert-match --extended-regexp "(app.kubernetes.io/version|helm.sh/chart):") |
I'm closing this since we did not receive any feedback and verified the chart is working with the provided documentation. Feel free to reopen if you have further questions or information about your use case. /close |
@Gacko: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I experience the same issue. Thanks @Gacko, your solution (setting the health probe path) fixed it. |
What happened:
The Azure load balancer stops working after upgrading the ingress-nginx Helm chart from v1.8.4 to v1.9.5. Azure LB health checks fail permanently. There is no error message in the ingress-nginx or Azure LB logs.
What you expected to happen:
Azure LB continues to work as before.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller
Release: v1.9.5
Build: f503c4b
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.21.6
Kubernetes version (use
kubectl version
):Environment:
Azure AKS
uname -a
):Linux version 5.15.0-1053-azure (buildd@bos03-amd64-012) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #61-Ubuntu SMP Tue Nov 21 14:16:01 UTC 2023
Terraform
Helm
Anything else we need to know:
Seems to be related to #10863
The text was updated successfully, but these errors were encountered: