Incorrect path for nginx ingress health probes #2903

shiv-rajawat · 2022-04-22T10:24:58Z

What happened: Upon installing nginx ingress using helm, the default health probe with HTTP/HTTPS is getting configured with path as "/". As a result, the inbound connectivity to that nginx public IP is not working.

What you expected to happen: The path should be "/healthz" with HTTP/HTTPS health probes.

How to reproduce it: Install nginx ingress using helm
helm install ingress-nginx ingress-nginx/ingress-nginx

Check the AKS loadbalancer health probes for path in nginx ingress probes (HTTP/HTTPS).

Anything else we need to know?: The issue is only reproducing for us in selected West Europe AKS clusters.

Environment:

Kubernetes version: 1.22.6
Size of cluster: 30 nodes
General description of workloads in the cluster - HTTP microservices, Angular applications

The text was updated successfully, but these errors were encountered:

ghost · 2022-04-22T10:25:03Z

Hi shiv-rajawat, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
Please abide by the AKS repo Guidelines and Code of Conduct.
If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

LennartHaggkvistHM · 2022-04-22T12:44:10Z

This is also affecting us.

My conclusions are:

It is not only west Europe - we have the problem in east Australia as well
Old clusters has LB health probes of type TCP - but the newly created clusters use HTTP or HTTPS
Those health probes seems to be using the wrong path - manually changing to /healtz worked
Changing the HTTP/HTTPS probe to type TCP worked also (note that the port from the HTTP/HTTPS must be used and not the one that the portal suggests...)

mkoertgen · 2022-04-22T19:08:07Z

Same for me. Scenario: AKS (EastUS2) + ingress-nginx (via helm).

My observations:

Frontend-IP-config for Azure Load Balancer in the MC-rg then fails.
Old clusters are using TCP-Probes mapped to the k8s-service-ports.
New clusters are using HTTP/HTTPS-Probes mapped to 80/443 respectively.
As a result no traffic is routed to the ingress-controller since health probes fail.

Workaround:
When manually changing health probes back to TCP and mapped ports, health probes reconcile after a while.

Load Balancer starts routing traffic again

And we finally get our nice nginx 404-page again

jabbera · 2022-04-23T16:40:39Z

I just lost 4 hours of my weekend trying to diagnose this issue..... Eastus2

shiv-rajawat · 2022-04-23T16:59:58Z

Well, I lost my entire day to debug this issue. Surprisingly I’m not able to reproduce this issue in other clusters in the same region.

jabbera · 2022-04-23T17:07:03Z

The only notable change I can think of is my dev cluster that didn't see the issue was created as 1.22.06 whereas my production cluster that saw the issue was an upgrade from 1.21.9

jabbera · 2022-04-23T17:25:32Z

@miwithro support might be saved a ton of tickets if this could be tracked down. I almost opened a sev A over it.

shiv-rajawat · 2022-04-23T17:44:54Z

@miwithro support might be saved a ton of tickets if this could be tracked down. I almost opened a sev A over it.

We already have a Sev B opened.

jabbera · 2022-04-23T19:11:52Z

Same issue with my DR cluster in centralus.

sebader · 2022-04-25T08:39:14Z

Same issue here, AKS 1.22.6 in various regions (at least Sweden Central and Brazil south). Did anybody figure out a solution yet apart from the manual workaround?

aristosvo · 2022-04-25T14:20:43Z

It seems the same or similar as this solved issue on the kubernetes/ingress-nginx repo itself.

Usage of the flag --set controller.service.externalTrafficPolicy=Local should probably fix it.

I'm not 100% sure if this is just a workaround or the fix, but I'll leave that discussion for the AKS team.

ivanthelad · 2022-04-26T08:11:54Z

Hi,
is this behaviour by design?The cloud provider documentations states that the default path for the probe will be "/" if not specified.
The table thats that for http/https endpoints the default probe will be "/"

previously the behaviour used to be "/healthz" but a recent change altered this kubernetes-sigs/cloud-provider-azure@6cdd799 where the default path was modified

The cloud provides logic can be found here.
https://github.com/kubernetes-sigs/cloud-provider-azure/blob/2b91a85af9436e8826bf7c8c79fb055e6d58ad8a/pkg/provider/azure_loadbalancer.go#L1977

one work around would be to supply the annotation "service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path" to the nginx helm install
https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/values.yaml#L454

heoelri · 2022-04-26T08:35:57Z

It seems the same or similar as this solved issue on the kubernetes/ingress-nginx repo itself.

Usage of the flag --set controller.service.externalTrafficPolicy=Local should probably fix it.

I'm not 100% sure if this is just a workaround or the fix, but I'll leave that discussion for the AKS team.

Setting controller.service.externalTrafficPolicy to Local when deploying the ingress-nginx helm chart as mentioned by @aristosvo in #2903 (comment) did the trick on our end and solved the issue. The LB's health probe then points again to path /healthz instead of / - now for HTTP and HTTPS in a single health probe.

ivanthelad · 2022-04-26T12:24:39Z

What I see, the reason for this issue comes down to the helm chart adopting appProtocol spec combined with the cloud provider respecting the appProtocol setting. So when a service is created with appProtocol the LB will probe directly to pod process (via the kube-proxy hop). I think previously the probe was based on the NodeHealthPort created on each node. In this case the 503 would be returned from the kubernetes api if the pod endpoint was not available on the target host.
https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/templates/controller-service.yaml#L70

In general, i would guess the underlying CloudProvider was updated on AKS that has resulted in the appProtocol been respected and hence the changes to the probe where "/" is set when no path annotation is supplied on the service definition. possible fixes

Manually set the annotation on the service that includes the health check path
upgrade your nginx helm chart setting the param "Values.controller.service.appProtocol" to false

ivanthelad · 2022-04-26T12:25:19Z

i expect other applications that use to service spec appProtocol to be effected #2907

phealy · 2022-04-26T12:52:21Z

A documentation PR has been merged to add --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz to the helm commands, which will tell Cloud Provider how to set the health probe properly. Docs will be updated in the next build, later today.

sebader · 2022-04-26T12:59:35Z

@phealy thanks for updating the docs. While this makes all sense to me, including the solution, I want to point out this this is a major breaking change for many people out there and it was clearly not announced as such.

idmurphy · 2022-04-28T14:20:04Z

Looks like this issue started around 20/04 or slightly after. For AKS 1.22.6 built before then, we find the health probe is HTTPS with /healthz path added already, since then it has plain '/'. This is for exact same AKS 1.22.6 and same ingress/version helm automated deployment pipeline.

One other thing I note is that earlier clusters had aks-link in kube-system and the newer ones which are broken have tunnelfront in kube-system. Is this handling the health probe?

I find it very odd that the fix for this this is a documentation update and everyone stumbling into this issue needs to have a work-around hack on the helm deployment rather than the issue itself being fixed?! This is going to start causing more and more issues for everyone, especially anyone who has happily testing AKS upgrade to 1.22 in UAT before 20/04 and then suddenly find when they go to a production roll out it starts breaking. Shouldn't there be a general support notification of this issue?

jabbera · 2022-04-28T16:42:47Z

happily testing AKS upgrade to 1.22 in UAT before 20/04 and then suddenly find when they go to a production roll out it starts breaking.

This is exactly what happened to me on Saturday. I had a literal panic since there is no rollback. Also missed a little league game over it.

idmurphy · 2022-04-28T22:16:38Z

@phealy - is a "beta" option really considered a suitable solution for a production issue? When is this GA?

sebader · 2022-04-29T10:29:57Z

@idmurphy all the cloud provider-specific annotations are using the same syntax. This does not mean any of these are in preview. https://kubernetes-sigs.github.io/cloud-provider-azure/topics/loadbalancer/#loadbalancer-annotations (same for AWS https://kubernetes.io/docs/concepts/services-networking/service/#other-elb-annotations)

knunery · 2022-05-01T22:56:58Z

THANK YOU mkoertgen! Running into this issue. I was about to open a ticket for this issue.

feiskyer · 2022-05-03T04:03:07Z

Sorry to see the appProtocol support inside cloud provider has broken ingress-nginx for AKS clusters >=1.22. The issue was caused by two reasons:

1. the new version of nginx ingress controller added appProtocol and its probe path has to be /healthz;
1. the new version of cloud-controller-manager added HTTP probing with default path / for appProtocol=http services.

AKS is rolling out a fix to keep backward compatibility (ETA is 48 hours from now) with fix cloud-provider-azure#1626. After this fix, the default probe protocol would only be changed to appProtocol for clusters >=1.24 (refer docs for more details).

Before the rollout of this fix, please upgrade the existing ingress-nginx with --set controller.service.externalTrafficPolicy=Local for faster mitigation.

And for newly created helm releases, you could also set --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz to ensure /healthz probe path be used with HTTP/HTTPS protocol (AKS docs has been updated with this annotation as well, e.g. here).

ghost added the triage label Apr 22, 2022

aristosvo mentioned this issue Apr 25, 2022

AKS 1.22.*, 1.23.* Ingress-Nginx Not Working. #2908

Closed

heoelri mentioned this issue Apr 26, 2022

Port ingress-nginx configuration changes and Helm namespace creation from online Azure/Mission-Critical-Connected#171

Merged

ivanthelad mentioned this issue Apr 26, 2022

Private AKS with NGINX ingress controller using Azure Internal Load Balancer. Can't reach service via Load Balancer outside subnet/pod. #2907

Closed

phealy added question networking/load-balancers labels Apr 26, 2022

ghost removed the triage label Apr 26, 2022

heoelri mentioned this issue Apr 26, 2022

Change ingress-nginx health probe path Azure/Mission-Critical-Online#310

Merged

feiskyer closed this as completed May 3, 2022

phealy mentioned this issue May 19, 2022

Letsencrypt acme cert challenges no longer working on AKS (nginx-ingress + cert-manager + clusterissuer + letsencrypt) #2955

Closed

ghost locked as resolved and limited conversation to collaborators Jun 2, 2022

martingegenleitner added a commit to martingegenleitner/thales-dke-service-setup that referenced this issue Apr 6, 2023

Fixed Azure/AKS/issues/2903

1734870

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect path for nginx ingress health probes #2903

Incorrect path for nginx ingress health probes #2903

shiv-rajawat commented Apr 22, 2022 •

edited

Loading

ghost commented Apr 22, 2022

LennartHaggkvistHM commented Apr 22, 2022 •

edited

Loading

mkoertgen commented Apr 22, 2022 •

edited

Loading

jabbera commented Apr 23, 2022 •

edited

Loading

shiv-rajawat commented Apr 23, 2022

jabbera commented Apr 23, 2022 •

edited

Loading

jabbera commented Apr 23, 2022 •

edited

Loading

shiv-rajawat commented Apr 23, 2022

jabbera commented Apr 23, 2022

sebader commented Apr 25, 2022

aristosvo commented Apr 25, 2022 •

edited

Loading

ivanthelad commented Apr 26, 2022 •

edited

Loading

heoelri commented Apr 26, 2022

ivanthelad commented Apr 26, 2022 •

edited

Loading

ivanthelad commented Apr 26, 2022 •

edited

Loading

phealy commented Apr 26, 2022

sebader commented Apr 26, 2022

idmurphy commented Apr 28, 2022

jabbera commented Apr 28, 2022

idmurphy commented Apr 28, 2022

sebader commented Apr 29, 2022

knunery commented May 1, 2022

feiskyer commented May 3, 2022

Incorrect path for nginx ingress health probes #2903

Incorrect path for nginx ingress health probes #2903

Comments

shiv-rajawat commented Apr 22, 2022 • edited Loading

ghost commented Apr 22, 2022

LennartHaggkvistHM commented Apr 22, 2022 • edited Loading

mkoertgen commented Apr 22, 2022 • edited Loading

jabbera commented Apr 23, 2022 • edited Loading

shiv-rajawat commented Apr 23, 2022

jabbera commented Apr 23, 2022 • edited Loading

jabbera commented Apr 23, 2022 • edited Loading

shiv-rajawat commented Apr 23, 2022

jabbera commented Apr 23, 2022

sebader commented Apr 25, 2022

aristosvo commented Apr 25, 2022 • edited Loading

ivanthelad commented Apr 26, 2022 • edited Loading

heoelri commented Apr 26, 2022

ivanthelad commented Apr 26, 2022 • edited Loading

ivanthelad commented Apr 26, 2022 • edited Loading

phealy commented Apr 26, 2022

sebader commented Apr 26, 2022

idmurphy commented Apr 28, 2022

jabbera commented Apr 28, 2022

idmurphy commented Apr 28, 2022

sebader commented Apr 29, 2022

knunery commented May 1, 2022

feiskyer commented May 3, 2022

shiv-rajawat commented Apr 22, 2022 •

edited

Loading

LennartHaggkvistHM commented Apr 22, 2022 •

edited

Loading

mkoertgen commented Apr 22, 2022 •

edited

Loading

jabbera commented Apr 23, 2022 •

edited

Loading

jabbera commented Apr 23, 2022 •

edited

Loading

jabbera commented Apr 23, 2022 •

edited

Loading

aristosvo commented Apr 25, 2022 •

edited

Loading

ivanthelad commented Apr 26, 2022 •

edited

Loading

ivanthelad commented Apr 26, 2022 •

edited

Loading

ivanthelad commented Apr 26, 2022 •

edited

Loading