-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace router support for ingress with an ingress-to-route controller #18658
Replace router support for ingress with an ingress-to-route controller #18658
Conversation
@openshift/sig-networking @liggitt re: our security discussion. Implementing a full secret cache a la the Kubelet is not worth it. Route API is better designed for actual security than ingress in this case. |
2fc0c75
to
dd0be4c
Compare
/retest |
1 similar comment
/retest |
eventRecorder record.EventRecorder | ||
|
||
// Allows injection for testing, controls requeues on errors | ||
resourceFailureDelayFn func(requeue int) (time.Duration, bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead?
|
||
// defaultResourceFailureDelay will retry failures forever, but implements an exponential | ||
// capped backoff after a certain limit. | ||
func defaultResourceFailureDelay(requeue int) (time.Duration, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which would make this dead
ingressLister: ingresses.Lister(), | ||
secretLister: secrets.Lister(), | ||
routeLister: routes.Lister(), | ||
serviceLister: services.Lister(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally, you have a sync functions collected from these for a "wait for sync before run" check.
resourceFailureDelayFn: defaultResourceFailureDelay, | ||
} | ||
|
||
processNamespaceFn := func(obj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are usually separate queuing functions off the Controller struct.
case *v1.Secret: | ||
return t.Type == v1.SecretTypeTLS | ||
} | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're not a secret, why bother queuing. It won't work right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tombstones?
} | ||
|
||
processNamespaceFn := func(obj interface{}) { | ||
switch t := obj.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta.Accessor
is general case, right? It feels really close to a general pattern. I tried once before. I don't remember what stopped it.
}, | ||
Handler: cache.ResourceEventHandlerFuncs{ | ||
AddFunc: processNamespaceFn, | ||
DeleteFunc: processNamespaceFn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your function handles tombstones? I didn't realize they were metav1.objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we queue a tombstone by assuming that anything not a secret is probably a tombstone. It didn't feel like handling it specially added anything over "if we don't recognize it, queue anyway"
}, | ||
}) | ||
|
||
c.syncs = []cache.InformerSynced{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Yeah, reasonable.
return | ||
} | ||
|
||
if c.queue.NumRequeues(key) < maxRetries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't usually do this. The last AddRateLimited is order of tens of minutes. I keeps us from having to worry about permafails
|
||
func (c *Controller) sync(key queueKey) error { | ||
// sync all ingresses in the namespace | ||
if len(key.name) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do this when you enqueue? Self feeding like this is unusual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess no reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember, because it collapses all namespace retries for these entries. Instead of each change traversing the cache, rapid changes result in de-duping all changes to the namespace into one, which then gets queued. Both secrets and services can go through burst creation. So instead of queuing everything in each namespace multiple times, we queue the namespace key. I think this is a better pattern when the controller has to react to multiple blanket events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could clean the terminology up, but it makes the tests a lot more centralized
|
||
ingress, err := c.ingressLister.Ingresses(key.namespace).Get(key.name) | ||
if err != nil { | ||
if errors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this above the err check to reduce nesting
processRouteFn := func(obj interface{}) { | ||
switch t := obj.(type) { | ||
case *routev1.Route: | ||
name, ok := hasIngressOwnerRef(t.OwnerReferences) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ingressName
please. My eyes skipped past this detail
} | ||
|
||
if len(errs) > 1 { | ||
return fmt.Errorf("multiple errors occurred while creating the ingress: %v", errs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewAggregate
?
@smarterclayton I didn't go down into the details, but this does read like a pretty simple controller. I would queue directly instead of sometimes queuing an invalid key that you expand in your sync func though. |
dd0be4c
to
048067e
Compare
/test gcp |
There appears to be a race condition in the router storage where a rapidly updated route is being seen as conflicting with itself (it could be the controller executing multiple writes but I don't see any evidence of that happening). That led me to identify an inconsistency in how we handle rejections inside the router - we should be passing down to our nested admission chain delete events when we reject an input, even if we may not have admitted it before (can leave old routes in the router state). Will continue to test, but highlights that we need to have a consistency test for the router before we exit 3.10. |
/retest |
1 similar comment
/retest |
4ee1dc4
to
c237f98
Compare
/retest |
@openshift/networking PTAL. This looks reasonable to me, but I would really like more reviewers. |
We don't support the router value defaulting on ingress correctly today, so I wanted to punt that to a follow up where we add a boolean on route spec like |
/retest |
1 similar comment
/retest |
FWIW, I tested this in a devenv ( |
@@ -296,6 +297,18 @@ func init() { | |||
}, | |||
}) | |||
|
|||
// ingress-secretref-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment.
@@ -56,6 +56,10 @@ func (p *ExtendedValidator) HandleRoute(eventType watch.EventType, route *routea | |||
if ok && reflect.DeepEqual(old.Spec, route.Spec) { | |||
// Route spec was unchanged and it is already marked in | |||
// error, we don't need to do anything more. | |||
p.plugin.HandleRoute(watch.Deleted, route) | |||
if eventType == watch.Deleted { | |||
delete(p.invalidRoutes, routeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought this was a fix for a memory leak, but then I realized that nothing appears to add entries to invalidRoutes
; is the outer if
block ever executed? (Perhaps @ramr can clarify.)
test/extended/testdata/ingress.yaml
Outdated
kind: List | ||
apiVersion: v1 | ||
items: | ||
# an ingress that should be captured as three routes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/three/four/
} | ||
for name, stdout := range values { | ||
stdout = strings.TrimSuffix(stdout, "\n") | ||
e2e.Logf(name + ": " + strings.Join(strings.Split(stdout, "\n"), fmt.Sprintf("\n%s: ", name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safer to use e2e.Logf("%s", name + ": " + strings.Join(strings.Split(stdout, "\n"), fmt.Sprintf("\n%s: ", name)))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how it's safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the off chance there's (e.g.) a %s
in the output. Granted the worst case is probably that you get some spurious "%!s(MISSING)" in the log output, but why wouldn't you just say what you mean?
case metav1.Object: | ||
ns := t.GetNamespace() | ||
if len(ns) == 0 { | ||
utilruntime.HandleError(fmt.Errorf("object %T has no namespace", obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why %T
(type) and not %+v
or %#v
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All we care about here is the type of object that we don't recognize.
r.Spec.To.Weight = existing.Spec.To.Weight | ||
if r.Spec.TLS != nil && existing.Spec.TLS != nil { | ||
r.Spec.TLS.CACertificate = existing.Spec.TLS.CACertificate | ||
r.Spec.TLS.DestinationCACertificate = existing.Spec.TLS.DestinationCACertificate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to preserve existing.Spec.TLS.DestinationCACertificate
without preserving existing.Spec.TLS.Termination
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't set the destination in an ingress, so this was really just a sop left to allow someone to still get that feature. We could add the more complex logic to reset based on type.
return tlsConfig == nil | ||
} | ||
if secret.Type != v1.SecretTypeTLS { | ||
return tlsConfig == nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic does not match that of newRouteForIngress
, which does not return a route for an ingress that references a secret that has the wrong type. Shouldn't you just return false
here?
If there is no reason for the logic to differ, you could factor the common logic out of newRouteForIngress
:
func routeTLSConfigForSecret(secret *v1.Secret) *routev1.TLSConfig {
if secret.Type != v1.SecretTypeTLS {
return nil
}
if _, ok := secret.Data[v1.TLSCertKey]; !ok {
return nil
}
if _, ok := secret.Data[v1.TLSPrivateKeyKey]; !ok {
return nil
}
return &routev1.TLSConfig{
Termination: routev1.TLSTerminationEdge,
Certificate: string(secret.Data[v1.TLSCertKey]),
Key: string(secret.Data[v1.TLSPrivateKeyKey]),
InsecureEdgeTerminationPolicy: routev1.InsecureEdgeTerminationPolicyRedirect,
}
}
Then newRouteForIngress
would use the new function as follows:
var tlsConfig *routev1.TLSConfig
if name, ok := referencesSecret(ingress, rule.Host); ok {
secret, err := secretLister.Secrets(ingress.Namespace).Get(name)
if err != nil {
// Secret doesn't exist yet; wait.
return nil
}
tlsConfig = routeTLSConfigForSecret(secret)
if tlsConfig == nil {
return nil
}
}
secretMatchesRoute
would use the new function as follows:
if secret == nil {
return tlsConfig == nil
}
expectTLSConfig := routeTLSConfigForSecret(secret)
if expectTLSConfig == nil {
return false
}
if tlsConfig == nil {
return false
}
return tlsConfig.Termination == expectTLSConfig.Termination &&
tlsConfig.Certificate == expectTLSConfig.Certificate &&
tlsConfig.Key == expectTLSConfig.Key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid allocating in the common case. This controller gets woken up on any secret change in a namespace which is expected to be common, therefore we have to be very careful not to over allocate during evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. Is it common that
- an ingress exists,
- a corresponding route exists,
- the ingress references a secret,
- and that secret exists,
- but the secret is the wrong type?
If this is a case worth optimizing, would it also make sense to return true in the case that the secret does not exist (i.e., return true
from routeMatchesIngress
if secretLister.Secrets(ingress.Namespace).Get(name)
is nil
)? Why treat secret-does-not-exist differently from secret-exists-but-has-the-wrong-type (which probably means the secret does not exist, and another secret is using its name)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any change to a namespace for a secret will cause us to need to reevaluate every ingress in the namespace. This is a level driven controller, so we will run over this code path for every ingress
- everytime a secret changes in a namespace
- everytime a service changes in a namespace
- everytime a resync interval hits
So it does matter to be as efficient as possible because this line of code will be invoked quite often.
The reason we return false is because the method does what it says on the tin - you can't reference a non TLS secret type from an ingress, so returning false means "doesn't match".
// 2. For every TLS hostname that has a corresponding path rule and points to a secret | ||
// that exists, a route should exist with a valid TLS config from that secret. | ||
// 3. For every service referenced by the ingress path rule, the route should have | ||
// an update to date target port based on the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"up to date"?
// 3. For every service referenced by the ingress path rule, the route should have | ||
// an update to date target port based on the service. | ||
// 4. A route owned by an ingress that no longer satisfies the first three invariants | ||
// should be deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the wording of invariant 4 is quite right. When an ingress has two rules where on rule satisfies all requirements (nonempty host, nonempty backend, no TLS or all requisite secrets, valid service reference) and the other rule fails some requirements, the controller will create and keep the route for the latter rule.
Maybe, "For every route owned by an ingress, if that route and its corresponding path rule no longer satisfy the conditions in the first three invariants, the route should be deleted"?
// controller creates a controller owner reference from the route to the parent ingress, | ||
// allowing users to orphan their ingress. All owned routes have specific spec fields | ||
// managed (those attributes present on the ingress), while any other fields may be | ||
// modified by the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "managed" mean "set" (to the corresponding ingress attributes), or something else?
Since you mention that the user may modify fields, it may warrant mentioning that the only certain fields (namely name, weight, CA and destination CA certificates, and insecure edge termination policy) are preserved should the ingress be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Managed" means "overwritten by the controller whenever the ingress changes".
1a87746
to
7ca5f80
Compare
Updated with comments and a fix for duplicate creation. I reproduced the issue on the server side that caused multiple route creations and deletions to be recorded and fixed that in the last commit (like other controllers that generate resources that don't have fixed names, we needed to use an expectation to avoid racing with the caches). There is indeed a real problem in the router that it looks like a stress test should be able to reproduce. I suspect it is due to improperly clearing an underlying cache entry for duplicate names. Going to spawn a separate issue for that and ensure that whatever stress infrastructure Ram and I create that it correctly finds that issue. |
@smarterclayton: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
7ca5f80
to
47840db
Compare
/test extended_conformance_k8s |
After looking at unique_host, it looks like route claiming is broken when paths are involved. I'm going to open a separate PR that fixes the issues. At heart, when a route is deleted other valid routes aren't being restored. This looks like a bug that we've had for a very long time. |
Fixed the unique_host issue in #19175. |
The router itself can't watch secrets, which means any changes to ingress may take a long time to propogate. The ingress-to-route controller replaces the need for the router to watch secrets or ingress. During a 3.9 to 3.10 upgrade, the router will continue to show old ingress until a rolling update happens, at which time it should begin serving hosts and paths from the newer routes created by the controller.
A controller maintains the invariant that for every valid ingress there exists zero or more routes that represent the ingress faithfully. The controller watches ingresses, routes, and secrets referenced by the ingress, and synthesizes an ingress equivalent. Features not supported by the router are ignored by the controller.
Any admission plugin that can vary its output due to ordering could leave nested plugins in an inconsistent state. Whenever we exit from a plugin early *except* in the status admitter we should propogate a delete down to the lower plugin to ensure that the route is truly removed. Also prevent unbounded items in the extended validator map. Observed that when transitioning routes from admitted to rejected or vice versa that some routes were being preserved in the lower levels.
Create a new test harness to simplify sending URL requests to running routers.
When a controller creates multiple routes from a single ingress it may observe arbitrary delays between the time of creation and when it observes that route in its cache. Like other controllers that use random name generation (secret controller, replica set controller) we therefore need to wait until our expectation is met. To simplify the code, we bypass server side name generation and use our own name generation to avoid racing against the server (create route, server sends event, set expectation = expectation is never cleared).
47840db
to
c36b2e5
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: knobunc, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The existing ingress support in the router has a number of serious problems:
This PR changes from the routers handling ingress to a new
ingress-to-route
controller that creates actual routes for each ingress, then keeps them in sync. It also has permission to watch secrets and will update the routes when those change, which means routers observe ingress secret changes as quickly as possible. Each child route has an owner/controller ref back to the ingress, which means garbage collection manages its lifecycle.Example:
It is also much cleaner, easier to test (unit and in isolation), easier to review, and simplifies router code significantly. I had originally looked at implementing this as an RBAC sync controller (granting routers access to secrets) but that only exposed the real problem which was that routers can't watch on secrets. Since Routes are generally better designed as a secure API than ingress, and ingress is likely to substantially change over the next few releases, this approach allows us to continue to shield our users from the impact of that uncertainty while preserving all the good things about routes.
Supporting ingress correctly is a key short term goal so that we can adapt users in the ecosystem who depend on ingress.
This adds no feature support above for ingress above and beyond, but it is relatively easy for us to adapt community annotations that can use our features.
Still remaining: