-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible to change hostname<->ingress mapping without downtime due to admission webhook #10820
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-kind bug What you expect is not possible currently because that rules you create transform into nginx.conf directives and you are concurrently expecting 2 directives with the exact same config for server block and location block. |
Is that fully invalid nginx configuration that will cause the reload to fail? Or just bad practice? And, that's how it works now but we could change that, no? Could we not do a first pass and omit the second one? Or add an option that is on by default, but when off, makes the admission controller allow this situation? Also, I would still call this a bug even if it is a wontfix, as if I were using raw nginx by myself, I could coordinate a 0 downtime reload by preparing the new configuration ahead of time and atomically reloading to create the new one, but because I'm using ingress-nginx, I can't atomically delete and create the new ingress objects all at once. |
I can not comment on bug/wontfix type of topics just yet based on the data here. You could try to attempt this on vanilla nginx, without kubernetes, and present results here. The mass use case of ingress-controller does not include the use-case you described for one thing. And allowing for duplicate host+path is not viable for real-life production use because ingress resource fundamental spec is routing based on hostname and path. What you expect seems like a desired feature for users who design and alter-design as you described. But my opinion is that the amount of resources required to develop such features is not available. |
I have done this in the past, but requires careful orchestration, though its the same kind you'd have to do whenever reloading nginx at all. Let's say you have one
Well, the k8s Ingress resource supports this just fine, it is the admission webhook that is declaring this configuration invalid. What is not viable about the semantics I described above, where ingress-nginx resolves the ambiguity using some set of rules? I think viability and "we don't want to encourage this as it is confusing" are two different things, and it may very well be that it is the wrong default to allow this situation. But for folks with flowing production traffic, we're kind of sunk without some way around this, so to me this seems like a real bug.
What would be involved in a fix do you think? |
Also, what did versions 1.15 and before do, when the admission webhook didn't prevent this situation? This comment from the previous thread seems to indicate this actually used to work. If they managed to stay up, to me that is evidence this is technically possible, and might just work if we added an option to disable this specific check in the admission webhook. |
I don't know what is right when there are 2 identical configurations for routing. Although momentary and maybe even incidental, I really do not understand that scene. My limited understanding is that the uniqueness of hostname+path decides routing. Unlike vanilla non-K8S nginx reverseproxy, the connection terminates on the controller so even a brief, flashing, momentary config where 2 identical configurations hinder routing, potentially break established connections. This could be the reason that check/validation was put in place. I don't know for sure. I agree that the change you described can not avoid downtime but please join the community meeting and bring up this feature requirement. I am nowhere near that decision making. |
/kind feature |
See also #10090 |
Since you state that the first of 2, among duplicates, wins, you can also write about all scenarios that will occur, when duplicates are valid. Then wait for comments on that change to the controller, from others. |
Hi @longwuyuan , the webhook is not required for operation of the controller. The controller already has defined behaviour for duplicate host+path. So there is no need to re-identify all scenarios, there is only a request for the webhook validator to NOT enforce a check that is unnecessary. |
Sorry I was not clear. In the scene of a transition from clubbed rules to individual rules, temporarily/momentarily/transitionally allowing duplicates is an advantage I see. But for the entire user community to remove protection from duplication in the validation, is not a improvement, you will agree (because unexpected routing will be a impact). Hence my comment that you write how you would handle duplicates, in real world. That will be input fpr readers here. It will help dive into the "oldest wins" design. |
I feel like if there is already existing behaviour in the controller itself for handling duplicates, we'd have to come up with a really good reason for making a breaking change to that behaviour. This is subjective but I don't think there is really a resolution to the ambiguity that 90% of people would predict -- it's kind of arbitrary IMO. You could argue the oldest one is best, or the newest one is best, or that there should be some tool for specifying which one is the right one, but IMO because there isn't an obviously-the-best solution that folks would guess is what would happen, anyone dealing with this needs to read the docs to understand what is going to happen anyways. With that being the case, I feel like the existing logic would be fine for my use case, so no need to change anything in the controller at al! Why churn if we don't have to? The solution I think would be best (and it sounds like @hobti01 could use too) would be to add a new option that governs the behaviour of just the webhook validator. By default, this option would keep the behaviour of today to avoid making a breaking change, and duplicates shouldn't be allowed to catch mistakes. But, for brave folks like us, we could enable duplicate support and have duplicates still pass webhook validation. That seems like a low risk way to add support for this. |
My understanding is that the controller has long-established behaviour for handling duplicate host+path. The existing behaviour of the controller allows zero-downtime Ingress replacement by creating two Ingress resources with the same host+path but different names. This is a valuable use case. However, the webhook validation has long prevented this use case and there is no method to disable only the overlap check without disabling the entire admission webhook. |
We're currently trying to integrate admission webhook in our clusters, and the issue blocks integration, since we used nginx ingress controller with allow-duplicates behaviour for years now. Admission webhook could emit warning instead of deny. I think it will be good to have flag like |
## What this PR does / why we need it: In kubernetes#5651 there was a request to throw an error when there are two ingresses defining the same host and path, which was implemented as part of the validation webhook. Despite of this there are clear rules on the ingress controller that describes what the controller would do in [such situation (the oldest rule wins)](https://github.com/kubernetes/ingress-nginx/blob/main/docs/how-it-works.md?plain=1#L27) Some users are relying on this validation behaviour to prevent misconfigurations, but there are use cases where allowing it, maybe temporarily, is helpful. Those use cases includes: - Splitting large ingresses objects in smaller ones kubernetes#10820 - Moving ingress objects between namespaces without downtime (like when you transfer an ingress from team A that owns namespace A to team B that owns namespace B) kubernetes#10090 <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> ## Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] CVE Report (Scanner found CVE and adding report) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation only ## Which issue/s this PR fixes fixes kubernetes#10820 fixes kubernetes#10090 <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## How Has This Been Tested? building an image and testing it in a local cluster, will update later with some real life scenarios <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> ## Checklist: - [X] My change requires a change to the documentation. - [ ] I have updated the documentation accordingly. - [X] I've read the [CONTRIBUTION](https://github.com/kubernetes/ingress-nginx/blob/main/CONTRIBUTING.md) guide - [X] I have added unit and/or e2e tests to cover my changes. - [X] All new and existing tests passed. Change-Id: I9d4124d1c36876b06d63100cd10988eaf2f41db9
Another alternative/workaround here that my team is using in our prod environment is to disable validations for a subset of ingresses with a particular label. This was pretty easy to do by updating the admission webhook's object selectors. This still guarantees that other ingresses created without our special label are fully validated. This has been working well in our case as we are mostly copying existing, validated ingresses and moving them around / renaming them. |
## What this PR does / why we need it: In kubernetes#5651 there was a request to throw an error when there are two ingresses defining the same host and path, which was implemented as part of the validation webhook. Despite of this there are clear rules on the ingress controller that describes what the controller would do in [such situation (the oldest rule wins)](https://github.com/kubernetes/ingress-nginx/blob/main/docs/how-it-works.md?plain=1#L27) Some users are relying on this validation behaviour to prevent misconfigurations, but there are use cases where allowing it, maybe temporarily, is helpful. Those use cases includes: - Splitting large ingresses objects in smaller ones kubernetes#10820 - Moving ingress objects between namespaces without downtime (like when you transfer an ingress from team A that owns namespace A to team B that owns namespace B) kubernetes#10090 <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> ## Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] CVE Report (Scanner found CVE and adding report) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation only ## Which issue/s this PR fixes It might help with kubernetes#10820 <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## How Has This Been Tested? building an image and testing it in a local cluster, will update later with some real life scenarios <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> ## Checklist: - [X] My change requires a change to the documentation. - [ ] I have updated the documentation accordingly. - [X] I've read the [CONTRIBUTION](https://github.com/kubernetes/ingress-nginx/blob/main/CONTRIBUTING.md) guide - [X] I have added unit and/or e2e tests to cover my changes. - [X] All new and existing tests passed. Change-Id: I9d4124d1c36876b06d63100cd10988eaf2f41db9
## What this PR does / why we need it: In kubernetes#5651 there was a request to throw an error when there are two ingresses defining the same host and path, which was implemented as part of the validation webhook. Despite of this there are clear rules on the ingress controller that describes what the controller would do in [such situation (the oldest rule wins)](https://github.com/kubernetes/ingress-nginx/blob/main/docs/how-it-works.md?plain=1#L27) Some users are relying on this validation behaviour to prevent misconfigurations, but there are use cases where allowing it, maybe temporarily, is helpful. Those use cases includes: - Splitting large ingresses objects in smaller ones kubernetes#10820 - Moving ingress objects between namespaces without downtime (like when you transfer an ingress from team A that owns namespace A to team B that owns namespace B) kubernetes#10090 <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> ## Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] CVE Report (Scanner found CVE and adding report) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation only ## Which issue/s this PR fixes It might help with kubernetes#10820 <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## How Has This Been Tested? building an image and testing it in a local cluster, will update later with some real life scenarios <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> ## Checklist: - [X] My change requires a change to the documentation. - [ ] I have updated the documentation accordingly. - [X] I've read the [CONTRIBUTION](https://github.com/kubernetes/ingress-nginx/blob/main/CONTRIBUTING.md) guide - [X] I have added unit and/or e2e tests to cover my changes. - [X] All new and existing tests passed. Change-Id: I9d4124d1c36876b06d63100cd10988eaf2f41db9
## What this PR does / why we need it: In kubernetes#5651 there was a request to throw an error when there are two ingresses defining the same host and path, which was implemented as part of the validation webhook. Despite of this there are clear rules on the ingress controller that describes what the controller would do in [such situation (the oldest rule wins)](https://github.com/kubernetes/ingress-nginx/blob/main/docs/how-it-works.md?plain=1#L27) Some users are relying on this validation behaviour to prevent misconfigurations, but there are use cases where allowing it, maybe temporarily, is helpful. Those use cases includes: - Splitting large ingresses objects in smaller ones kubernetes#10820 - Moving ingress objects between namespaces without downtime (like when you transfer an ingress from team A that owns namespace A to team B that owns namespace B) kubernetes#10090 <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> ## Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] CVE Report (Scanner found CVE and adding report) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation only ## Which issue/s this PR fixes It might help with kubernetes#10820 <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## How Has This Been Tested? building an image and testing it in a local cluster, will update later with some real life scenarios <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> ## Checklist: - [X] My change requires a change to the documentation. - [ ] I have updated the documentation accordingly. - [X] I've read the [CONTRIBUTION](https://github.com/kubernetes/ingress-nginx/blob/main/CONTRIBUTING.md) guide - [X] I have added unit and/or e2e tests to cover my changes. - [X] All new and existing tests passed. Change-Id: I9d4124d1c36876b06d63100cd10988eaf2f41db9
What happened:
I would like to break up one multi-host ingress into many single host ingresses so I can adjust the annotations between them independently.
I have an ingress like this:
I am trying to add a second ingress like this:
The second ingress has a different annotation than the main one that I want to apply to
foo
only, and not to bar. For this reason, I need to create two different ingresses, one for foo and one for bar. I can't currently, as I get a message like this when trying to create the second one:What you expected to happen:
I expected that creating a second, new ingress that re-used one of my existing hostnames would succeed, and there would be some precedence rule which selected which ruleset processed requests until I deleted one or the other. It did not -- instead, I can't create the second ingress.
If this was a test cluster, this would be very easy, as I could just delete the one multi-hostame ingress and then create several new, single hostname ingresses. But, it's a production cluster where traffic is flowing through all these hostnames at once. I am looking for a zero-downtime way of splitting up this ingress and I cannot find one.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
Kubernetes version (use
kubectl version
):Environment:
uname -a
): GKE defaultHow to reproduce this issue:
Anything else we need to know:
This issue is very similar to #8216, but different. That issue concerned re-using hostnames across different ingress classes. I am finding that is working fine. This issue is about re-using hostnames within the same ingress-class on purpose.
Key comments concerning this issue specifically from the previous thread: #8216 (comment) #8216 (comment)
My take:
I'm opening this issue as I believe this issue is not cosmetic. If you start an ingress out with multiple hostnames, you can't ever change your mind about the arrangement of them. If you ever need to adjust the annotations on some but not other hostnames, you have to have picked the arrangement correctly from the get-go.
I suspect a workaround is deploying a new ingress controller with a different class, re-creating all my ingresses on that ingress controller, deleting the one I want to change on that side, creating hostname-split ingresses there, validating that it is working, and then doing a scary big-bang DNS change to move over to that ingress. That is a pretty risky operation and a lot of effort just to be able to split one ingress apart. I'd really like to avoid it if possible.
Another workaround would be to disable the admission controller. But, as captured in that thread, the admission controller does a lot of critical validation that I don't want to lose. I just want to pass through an ambiguous state for a brief time, and return back to the nice normal state. I am ok with saying "when there are duplicate hostnames there is some rule that says one ingress is totally ignored", but allowing the creation of the duplicate ingresses allows for a zero-downtime cutover when the ambiguity is resolved by deleting one or the other.
Also, if I disable the admission webhook and create ambiguous ingresses, and the nginx reload succeeds, that means the ingress controller is a more zealous validator than nginx itself, which to me merits a configuration option, as it's not a strictly necessary validation for continued operation.
The text was updated successfully, but these errors were encountered: