-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disabling public network access and using UserDefinedRouting #3690
Comments
For what it's worth, you can find here a repository that reproduces the issue. I am experiencing the very same problem since last Tuesday. It was working fine until that time for the last months. |
Same issue here while trying to upgrade from az aks upgrade --name <cluster> --resource-group <group> --subscription <subscription> --no-wait --kubernetes-version "1.25.6"
Kubernetes may be unavailable during cluster upgrades.
Are you sure you want to perform this operation? (y/N): y
Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version 1.25.6. Continue? (y/N): y
(BadRequest) UserDefinedRouting is not supported when Cluster has public network access set to Disabled.
Code: BadRequest
Message: UserDefinedRouting is not supported when Cluster has public network access set to Disabled. It seems as if the Azure RM API has changed, as I was able to upgrade another cluster (created with the exact same version of terraform module) two weeks ago. The terraform module uses: ...
network_profile {
...
outbound_type = "userDefinedRouting"
}
public_network_access_enabled = false
private_cluster_enabled = true How am I now able to upgrade my clusters? 😐 |
According to my ticket with Microsoft, this is the new normal:
Will update any solution we find - it is not happening to all subscriptions, so might be a phased rollout |
What is the |
It's my understanding it places the default loadbalancer in your vnet as opposed to being publicly accessible |
Well, I don't know how you deploy your AKS cluster, but in my case, my internal load-balancer (ILB) is deployed through the nginx ingress with the following annotations (among others):
Therefore, my ILB gets deployed to a private subnet, and this has nothing to do with the |
ya I didn't have to add those annotations because the loadbalancer was created in the subnet we have designated to aks (nodepools, api server private endpoint, loadbalancer's frontend ips (private)) |
@andrewkreuzer ok I didn't know it was possible |
I'm beginning to believe it shouldn't be |
my bad I do have those annotations annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: true
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz |
I haven't tried this yet because my test cluster was recreated, but this is a quick fix that Microsoft gave me to change the public access property without recreating the cluster:
Apparently if you have the private cluster enabled the public access option isn't needed and the cluster is still closed off from public access. |
@jvikes11 what does the content of |
that might be true if, and only if, your workers are deployed to a private subnet |
In my case, there are two potential methods for fixing this configuration issue (taken from the troubleshooting guide from within the azure portal):
For convenience, here is the ARM version as well
|
does this create a public load balancer "kubernetes"? |
I have a pretty big battery of tests for my private infrastructure on Azure, and just removing the problematic option has kept my tests green. I validate AKS privacy, hopefully deep enough. So my initial guess was very likely correct: the parameter is redundant with other settings like the NSG of the Subnet where the AKS workers are deployed into. |
Just verified with Microsoft support, that the cluster remains private.
|
ya the api private endpoint, which is controlled by |
in my case, where I explicitly set
I get the |
@phealy @chasewilson - please take a look at this |
Thanks everyone for the feedback The fix posted by @matthiasguentert allows you to update the clusters configuration without having to redeploy. I think there are a few things which can be added to Azure's documentation to better describe what these parameters do. It's still unclear exactly what the The documentation for UDR configuration[2] describes the use of UDR not creating a loadbalancer until a service of type loadbalancer is created within the cluster which explains the above configuration outcomes.
And you won't get a public IP unless you explicitly request one with a loadbalancer service
A section on the use of public network access being required for this configuration would be helpful as there is no mention of it in this documentation. And finally after running the above fix there is no perceived change in the cluster configuration displayed in the Azure portal. [1] Rest API AKS - PublicNetworkAccess |
Why was this closed? This is a very misleading configuration setting |
Indeed I don't think anything has been solved. The parameter is still there and its use is still is a mystery. |
from support:
strange though that the issue on the azurerm tf provider linked above and from my own experience did have a public loadbalancer created when the public_network_access was set to true. Anyways hopefully this clears things up for you |
@andrewkreuzer I am a bit lost with that explanation of your support engineer, as what they describe seem to rather correspond to parameter |
Hey folks, Hoping this can clear some questions. This went unnoticed, apologies. The full functionality of PublicNetworkAccess (PNA) is not really completed yet. Which is why we shouldn't have any docs about it out, let us know if you found any out there that we need to look into. It seems TF released this and there might be some interpretations on what it means/does. We're rushing some docs for this reason, but for the time being Private Cluster or the equivalent API Server VNet integration (in preview) are really the only things that affect your cluster control plane networking exposure. Your nodes and services exposure is controlled by you, internal/external services, NSG/FW, etc. PNA has no effect with private clusters, this change was part of the development process of the feature but we were not aware TF was already exposing it and with a default (to disable if I understood correctly). For private clusters on both current mode and vnet integration (that's N/A) since they already have no public connections allowed. We were testing if for public clusters we should allow for setting disabled and the behaviors there, and outbound type UDR was an outlier since we can't get communication back from the nodes (but our change caught all clusters not just public). It was an oversight to not check if some client was using this config already, it was wrongly assumed no since this was not fully out. This is not a required or finished property and I'm not sure of the TF context to start to support it, but we'll try to reach out to revert that. Sorry for the confusion |
Currently, it looks like TF azure provider always sets a value (default is true). See: kubernetes_cluster_resource.go#L1416 (and PR #18705). |
Describe your scenario
I have created a cluster with
outbound_type = UserDefinedRouting
andpublic_network_access_enabled = false
using the terraform provider. I am now receiving error:or from the portal:
A support ticket was opened and I was told:
Feedback
I'm confused as to why this is not supported.
Setting private_cluster_enabled keeps the api endpoint within the vnet, setting public_network_access_enabled to
false
keeps the loadbalancer within our vnet, and using outbound_typeUserDefinedRouting
to control egress traffic through our firewall ensures we control all outbound traffic. The fact that this was allowed and the cluster is functioning is more confusing. If this is intended to not be supported why does it work?We're now stuck in a state where we can't make changes to the cluster unless we enable public access (which would cause cluster re-creation)... and we have three clusters.
If there's something I'm misunderstanding or a technical reason as to why this is not supported I would be grateful of some insight
The text was updated successfully, but these errors were encountered: