Private AKS with NGINX ingress controller using Azure Internal Load Balancer. Can't reach service via Load Balancer outside subnet/pod. #2907

KIRY4 · 2022-04-25T11:42:23Z

What happened:
Hello everyone!! I have following issue/probably misconfiguration but I can't figure out what exactly I'm doing wrong.

From following VM's: vm-ubuntu-aks-mgmt-01 - 10.3.0.4, vm-ilb-01 - 10.3.1.6 I'm getting following.

curl 10.3.1.100 -k -v
Trying 10.3.1.100:80...
TCP_NODELAY set

curl http://helloworld.spg.internal -k -v
Trying 10.3.1.100:80...
TCP_NODELAY set

nslookup helloworld.spg.internal
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: helloworld.spg.internal
Address: 10.3.1.100

But if I jump INSIDE pod for example: kubectl exec -it nginx-ingress-ingress-nginx-controller-7475cb6b9f-x7nf2 -n ingress-basic sh. And then curl from inside pod:

I'm getting SUCESS. Status code 200 OK and html page.

What you expected to happen:
curl http://helloworld.spg.internal - success (status code: 200 OK) from everywhere in vnet I mean from all VM's mentioned below (vm-ubuntu-aks-mgmt-kmazurov-01 - 10.3.0.4, vm-ilb-01 - 10.3.1.6)

How to reproduce it (as minimally and precisely as possible):
For testing/learning purposes I deployed private AKS in pre-created VNET.

Here is VNET configuration (no NSG configured since that playground for learning purposes):
vnet-cloud-01: 10.3.0.0/22

Subnets:
snet-cloud-mgmt-01: 10.3.0.0/24 - here I deployed mgmt VM (vm-ubuntu-aks-mgmt-01 - 10.3.0.4) with kubectl, helm, az cli to access cluster;
snet-cloud-priv-aks-01: 10.3.1.0/24 - here I deployed private AKS + one VM just for testing purposes (vm-ilb-01 - 10.3.1.6);
GatewaySubnet: 10.3.2.0/24 - I'm using it for P2S VPN connection to vnet.

Following this article: https://docs.microsoft.com/en-us/azure/aks/ingress-internal-ip I deployed Nginx Ingress controller with internal Load Balancer IP 10.3.1.100. Service created successfully and IP was assigned correctly.

Also I attached private DNS zone (spg.internal) and attach it to vnet-cloud-01 also add following A record:
Name | Type | TTL | Value |
helloworld | A | 10 | 10.3.1.100|

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.22.6 - cluster, v1.23.6 - kubectl.
Size of cluster (how many worker nodes are in the cluster?). 1 worker node.
General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.)
https://docs.microsoft.com/en-us/azure/aks/ingress-internal-ip - workload from this article.
Others:

The text was updated successfully, but these errors were encountered:

ghost · 2022-04-25T11:42:26Z

Hi KIRY4, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
Please abide by the AKS repo Guidelines and Code of Conduct.
If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

PAKalucki · 2022-04-25T13:24:05Z

@KIRY4 Hey, I'm facing almost identical issue and after 2 days of troubleshooting I managed to establish that internal load balancer is unhealthy. It would seems that healthprobes are misconfigured, but I'm still unclear how to fix that part. Any and all input from AKS team would be appriciated on this.

KIRY4 · 2022-04-25T13:37:41Z

Hi @PAKalucki!! How did you get this diagram? I want to generate same diagram from my side...

PAKalucki · 2022-04-25T13:44:29Z

@KIRY4 In Azure Portal go to Load Balancing and find your load balancer created by AKS. In my case it was called "kubernetes-internal". Then go to insights (you might need to enable insights in aks).

KIRY4 · 2022-04-25T13:48:29Z

@PAKalucki thank you! I have same issue.

PAKalucki · 2022-04-25T13:54:32Z

@KIRY4 I have found workaround in the meantime. Go to Load balancer > Health probes. If you have probes for TCP protocol only make sure they point to proper ports (you can find them in AKS, nginx ingress service). If you have HTTP/HTTPS protocol probes make sure they use proper ports from AKS and that path is /healthz. The rules I had in there created by AKS were invalid.
It should looks similiar to mine (your ports might be different), and after few seconds of applying this change all traffic started working:

In case anyone from AKS team stumbles onto this, it's only a workaround. Please propose solution on how to solve this misconfiguration on AKS or Helm level so we don't have to change this manualy every time.

aristosvo · 2022-04-25T13:55:16Z

It seems the same or similar as this solved issue on the kubernetes/ingress-nginx repo itself.

Usage of the flag --set controller.service.externalTrafficPolicy=Local should probably fix it.

I'm not 100% sure if this is just a nicer workaround or the fix, but I'll leave that discussion up to the AKS team.

KIRY4 · 2022-04-25T14:20:56Z

@aristosvo Thank you --set controller.service.externalTrafficPolicy=Local solved an issue for me. I should note that simple reinstall with this flag was not enough for me. I made following:

Uninstall helm chart
Delete existing internal load balancer (without this step it wasn't work for me);
Install chart again with flag mentioned above.

helm install nginx-ingress ingress-nginx/ingress-nginx \
    --namespace ingress-basic --create-namespace \
    --set controller.replicaCount=1 \
    --set controller.service.externalTrafficPolicy=Local \
    -f internal-ingress.yaml

PAKalucki · 2022-04-25T16:47:33Z

In my case --set controller.service.externalTrafficPolicy=Local flag is only partial fix. The load balancer not only has missing /healthz path in Health probes but also ports are completly wrong. This might be caused by the fact that I'm using both HTTP and HTTPS but hard to tell.

jamesstanleystewart · 2022-04-26T02:52:19Z

We hit this same issue today, something seems to have changed in the configuration of the health probes for new AKS installations between February and now. We have several AKS instances (most recent install in Feb) and found they all had TCP health probes by default, only a recently added one had HTTP ones. Service health was showing "degraded" when the new AKS was first installed. When we changed to TCP to match our other installations traffic flowed to our ingresses as it had with previous installs and the service health event was resolved.

andbaraks · 2022-04-26T09:22:46Z

I was able to reproduce the behavior by using the following yaml file for testing and two AKS cluster, one with Kubernetes version 1.21, the other with 1.22.

apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-ingress-nginx-controller
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: nginx-ingress
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/version: 1.1.0
    helm.sh/chart: ingress-nginx-4.0.12
    helm.toolkit.fluxcd.io/name: nginx-ingress
    helm.toolkit.fluxcd.io/namespace: pluops-shared-nginx-ingress-main
  annotations:
    meta.helm.sh/release-name: nginx-ingress
    meta.helm.sh/release-namespace: pluops-shared-nginx-ingress-main
  finalizers:
    - service.kubernetes.io/load-balancer-cleanup
spec:
  ports:
    - name: https
      protocol: TCP
      appProtocol: https
      port: 443
      targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: nginx-ingress
    app.kubernetes.io/name: ingress-nginx
  type: LoadBalancer
  sessionAffinity: None
  externalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  allocateLoadBalancerNodePorts: true
  internalTrafficPolicy: Cluster

Then I removed appProtocol: https from the yaml and created the same service with another name and I concluded it creates it with Protocol TCP:

1.21 AKS cluster

1.22 AKS cluster

About appProtocol / health probe:
https://kubernetes-sigs.github.io/cloud-provider-azure/topics/loadbalancer/#custom-load-balancer-health-probe
https://kubernetes.io/docs/concepts/services-networking/service/#application-protocol

I think this is the expected behavior (when you have appProtocol: https to use for health probe https protocol) and 1.22 just adopts this. The reason this happens is because 1.22 has by default Cloud Controller Manager (https://docs.microsoft.com/en-us/azure/aks/out-of-tree).

To confirm that, I enabled CCM (there is an issue in the documentation, correct command is “az aks update -n aks -g myResourceGroup --aks-custom-headers EnableCloudControllerManager=True”) on my 1.21 AKS cluster and applied the service yaml which includes appProtocol: https (just with a different name) and confirmed the above.

This confirms what happens is the expected behavior and in Kubernetes version 1.22 "appProtocol" is reflected in the health probe of the Load Balancer.

ivanthelad · 2022-04-26T12:32:45Z

@andbaraks @KIRY4 see #2903 (comment)

phealy · 2022-04-26T12:53:02Z

A documentation PR has been merged to add --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz to the helm commands, which will tell Cloud Provider how to set the health probe properly. Docs will be updated in the next build, later today. Please follow #2903 for updates on the general issue.

ghost · 2022-04-28T18:00:42Z

Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days.

feiskyer · 2022-05-03T04:03:19Z

Sorry to see the appProtocol support inside cloud provider has broken ingress-nginx for AKS clusters >=1.22. The issue was caused by two reasons:

1. the new version of nginx ingress controller added appProtocol and its probe path has to be /healthz;
1. the new version of cloud-controller-manager added HTTP probing with default path / for appProtocol=http services.

AKS is rolling out a fix to keep backward compatibility (ETA is 48 hours from now) with fix cloud-provider-azure#1626. After this fix, the default probe protocol would only be changed to appProtocol for clusters >=1.24 (refer docs for more details).

Before the rollout of this fix, please upgrade the existing ingress-nginx with --set controller.service.externalTrafficPolicy=Local for faster mitigation.

And for newly created helm releases, you could also set --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz to ensure /healthz probe path be used with HTTP/HTTPS protocol (AKS docs has been updated with this annotation as well, e.g. here).

ghost added the triage label Apr 25, 2022

ivanthelad mentioned this issue Apr 26, 2022

Incorrect path for nginx ingress health probes #2903

Closed

phealy closed this as completed Apr 26, 2022

phealy reopened this Apr 26, 2022

phealy added docs resolution/answer-provided Provided answer to issue, question or feedback. labels Apr 26, 2022

ghost removed the triage label Apr 26, 2022

ghost closed this as completed Apr 28, 2022

rhollins mentioned this issue Apr 29, 2022

Azure kubernetes-internal loadbalancer health probe /healthz path is not being created kubernetes/ingress-nginx#8501

Closed

ghost locked as resolved and limited conversation to collaborators Jun 2, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Private AKS with NGINX ingress controller using Azure Internal Load Balancer. Can't reach service via Load Balancer outside subnet/pod. #2907

Private AKS with NGINX ingress controller using Azure Internal Load Balancer. Can't reach service via Load Balancer outside subnet/pod. #2907

KIRY4 commented Apr 25, 2022 •

edited

Loading

ghost commented Apr 25, 2022

PAKalucki commented Apr 25, 2022

KIRY4 commented Apr 25, 2022 •

edited

Loading

PAKalucki commented Apr 25, 2022

KIRY4 commented Apr 25, 2022

PAKalucki commented Apr 25, 2022 •

edited

Loading

aristosvo commented Apr 25, 2022 •

edited

Loading

KIRY4 commented Apr 25, 2022 •

edited

Loading

PAKalucki commented Apr 25, 2022

jamesstanleystewart commented Apr 26, 2022

andbaraks commented Apr 26, 2022 •

edited

Loading

ivanthelad commented Apr 26, 2022

phealy commented Apr 26, 2022

ghost commented Apr 28, 2022

feiskyer commented May 3, 2022

Private AKS with NGINX ingress controller using Azure Internal Load Balancer. Can't reach service via Load Balancer outside subnet/pod. #2907

Private AKS with NGINX ingress controller using Azure Internal Load Balancer. Can't reach service via Load Balancer outside subnet/pod. #2907

Comments

KIRY4 commented Apr 25, 2022 • edited Loading

ghost commented Apr 25, 2022

PAKalucki commented Apr 25, 2022

KIRY4 commented Apr 25, 2022 • edited Loading

PAKalucki commented Apr 25, 2022

KIRY4 commented Apr 25, 2022

PAKalucki commented Apr 25, 2022 • edited Loading

aristosvo commented Apr 25, 2022 • edited Loading

KIRY4 commented Apr 25, 2022 • edited Loading

PAKalucki commented Apr 25, 2022

jamesstanleystewart commented Apr 26, 2022

andbaraks commented Apr 26, 2022 • edited Loading

ivanthelad commented Apr 26, 2022

phealy commented Apr 26, 2022

ghost commented Apr 28, 2022

feiskyer commented May 3, 2022

KIRY4 commented Apr 25, 2022 •

edited

Loading

KIRY4 commented Apr 25, 2022 •

edited

Loading

PAKalucki commented Apr 25, 2022 •

edited

Loading

aristosvo commented Apr 25, 2022 •

edited

Loading

KIRY4 commented Apr 25, 2022 •

edited

Loading

andbaraks commented Apr 26, 2022 •

edited

Loading