-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect path for nginx ingress health probes #2903
Comments
Hi shiv-rajawat, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
This is also affecting us. My conclusions are:
|
Same for me. Scenario: AKS (EastUS2) + ingress-nginx (via helm). My observations:
Workaround: Load Balancer starts routing traffic again And we finally get our nice nginx 404-page again |
I just lost 4 hours of my weekend trying to diagnose this issue..... Eastus2 |
Well, I lost my entire day to debug this issue. Surprisingly I’m not able to reproduce this issue in other clusters in the same region. |
The only notable change I can think of is my dev cluster that didn't see the issue was created as 1.22.06 whereas my production cluster that saw the issue was an upgrade from 1.21.9 |
@miwithro support might be saved a ton of tickets if this could be tracked down. I almost opened a sev A over it. |
We already have a Sev B opened. |
Same issue with my DR cluster in centralus. |
Same issue here, AKS 1.22.6 in various regions (at least Sweden Central and Brazil south). Did anybody figure out a solution yet apart from the manual workaround? |
It seems the same or similar as this solved issue on the Usage of the flag I'm not 100% sure if this is just a workaround or the fix, but I'll leave that discussion for the AKS team. |
Hi, previously the behaviour used to be "/healthz" but a recent change altered this kubernetes-sigs/cloud-provider-azure@6cdd799 where the default path was modified The cloud provides logic can be found here. one work around would be to supply the annotation "service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path" to the nginx helm install |
Setting |
What I see, the reason for this issue comes down to the helm chart adopting appProtocol spec combined with the cloud provider respecting the appProtocol setting. So when a service is created with appProtocol the LB will probe directly to pod process (via the kube-proxy hop). I think previously the probe was based on the NodeHealthPort created on each node. In this case the 503 would be returned from the kubernetes api if the pod endpoint was not available on the target host. In general, i would guess the underlying CloudProvider was updated on AKS that has resulted in the appProtocol been respected and hence the changes to the probe where "/" is set when no path annotation is supplied on the service definition. possible fixes
|
i expect other applications that use to service spec appProtocol to be effected #2907 |
A documentation PR has been merged to add |
@phealy thanks for updating the docs. While this makes all sense to me, including the solution, I want to point out this this is a major breaking change for many people out there and it was clearly not announced as such. |
Looks like this issue started around 20/04 or slightly after. For AKS 1.22.6 built before then, we find the health probe is HTTPS with /healthz path added already, since then it has plain '/'. This is for exact same AKS 1.22.6 and same ingress/version helm automated deployment pipeline. One other thing I note is that earlier clusters had aks-link in kube-system and the newer ones which are broken have tunnelfront in kube-system. Is this handling the health probe? I find it very odd that the fix for this this is a documentation update and everyone stumbling into this issue needs to have a work-around hack on the helm deployment rather than the issue itself being fixed?! This is going to start causing more and more issues for everyone, especially anyone who has happily testing AKS upgrade to 1.22 in UAT before 20/04 and then suddenly find when they go to a production roll out it starts breaking. Shouldn't there be a general support notification of this issue? |
This is exactly what happened to me on Saturday. I had a literal panic since there is no rollback. Also missed a little league game over it. |
@phealy - is a "beta" option really considered a suitable solution for a production issue? When is this GA? |
@idmurphy all the cloud provider-specific annotations are using the same syntax. This does not mean any of these are in preview. https://kubernetes-sigs.github.io/cloud-provider-azure/topics/loadbalancer/#loadbalancer-annotations (same for AWS https://kubernetes.io/docs/concepts/services-networking/service/#other-elb-annotations) |
THANK YOU mkoertgen! Running into this issue. I was about to open a ticket for this issue. |
Sorry to see the appProtocol support inside cloud provider has broken ingress-nginx for AKS clusters >=1.22. The issue was caused by two reasons:
AKS is rolling out a fix to keep backward compatibility (ETA is 48 hours from now) with fix cloud-provider-azure#1626. After this fix, the default probe protocol would only be changed to appProtocol for clusters >=1.24 (refer docs for more details). Before the rollout of this fix, please upgrade the existing ingress-nginx with And for newly created helm releases, you could also set |
What happened: Upon installing nginx ingress using helm, the default health probe with HTTP/HTTPS is getting configured with path as "/". As a result, the inbound connectivity to that nginx public IP is not working.
What you expected to happen: The path should be "/healthz" with HTTP/HTTPS health probes.
How to reproduce it: Install nginx ingress using helm
helm install ingress-nginx ingress-nginx/ingress-nginx
Check the AKS loadbalancer health probes for path in nginx ingress probes (HTTP/HTTPS).
Anything else we need to know?: The issue is only reproducing for us in selected West Europe AKS clusters.
Environment:
The text was updated successfully, but these errors were encountered: