Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace router support for ingress with an ingress-to-route controller #18658

Merged

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Feb 18, 2018

The existing ingress support in the router has a number of serious problems:

  1. We cannot give a router "watch all secrets safely", so secret updates are delayed to the full resync interval
  2. Granting a router access to individual secrets requires some sort of ingress aware controller
  3. There are race conditions between secret update where a secret may not change for a full resync interval
  4. Users get no status on ingress, and the UI cannot show anything about them, which means cert and overlap failures are opaque

This PR changes from the routers handling ingress to a new ingress-to-route controller that creates actual routes for each ingress, then keeps them in sync. It also has permission to watch secrets and will update the routes when those change, which means routers observe ingress secret changes as quickly as possible. Each child route has an owner/controller ref back to the ingress, which means garbage collection manages its lifecycle.

Example:

kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: test
spec:
  rules:
  - host: test.com
     http:
       paths:
       - path: /test 
         backend:
           serviceName: test-1
           servicePort: 80

-->

kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: test-a34th
  ownerReferences:
  - apiVersion: extensions/v1beta1
     kind: Ingress
     name: test
     controller: true
spec:
  host: test.com
  path: /test
  to:
    name: test-1
  port:
     targetPort: 80

It is also much cleaner, easier to test (unit and in isolation), easier to review, and simplifies router code significantly. I had originally looked at implementing this as an RBAC sync controller (granting routers access to secrets) but that only exposed the real problem which was that routers can't watch on secrets. Since Routes are generally better designed as a secure API than ingress, and ingress is likely to substantially change over the next few releases, this approach allows us to continue to shield our users from the impact of that uncertainty while preserving all the good things about routes.

Supporting ingress correctly is a key short term goal so that we can adapt users in the ecosystem who depend on ingress.

This adds no feature support above for ingress above and beyond, but it is relatively easy for us to adapt community annotations that can use our features.

Still remaining:

  • Add an e2e test that verifies the router serves a materialized ingress
  • Determine how/whether to default spec.host (arguably we shouldn't be supporting this since ingress controllers don't have this concept)
  • Decide whether to expose load balancer status

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 18, 2018
@smarterclayton
Copy link
Contributor Author

@openshift/sig-networking @liggitt re: our security discussion. Implementing a full secret cache a la the Kubelet is not worth it. Route API is better designed for actual security than ingress in this case.

@smarterclayton
Copy link
Contributor Author

/retest

1 similar comment
@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

@ramr

eventRecorder record.EventRecorder

// Allows injection for testing, controls requeues on errors
resourceFailureDelayFn func(requeue int) (time.Duration, bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead?


// defaultResourceFailureDelay will retry failures forever, but implements an exponential
// capped backoff after a certain limit.
func defaultResourceFailureDelay(requeue int) (time.Duration, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which would make this dead

ingressLister: ingresses.Lister(),
secretLister: secrets.Lister(),
routeLister: routes.Lister(),
serviceLister: services.Lister(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally, you have a sync functions collected from these for a "wait for sync before run" check.

resourceFailureDelayFn: defaultResourceFailureDelay,
}

processNamespaceFn := func(obj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are usually separate queuing functions off the Controller struct.

case *v1.Secret:
return t.Type == v1.SecretTypeTLS
}
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're not a secret, why bother queuing. It won't work right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tombstones?

}

processNamespaceFn := func(obj interface{}) {
switch t := obj.(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.Accessor is general case, right? It feels really close to a general pattern. I tried once before. I don't remember what stopped it.

},
Handler: cache.ResourceEventHandlerFuncs{
AddFunc: processNamespaceFn,
DeleteFunc: processNamespaceFn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your function handles tombstones? I didn't realize they were metav1.objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we queue a tombstone by assuming that anything not a secret is probably a tombstone. It didn't feel like handling it specially added anything over "if we don't recognize it, queue anyway"

},
})

c.syncs = []cache.InformerSynced{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yeah, reasonable.

return
}

if c.queue.NumRequeues(key) < maxRetries {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually do this. The last AddRateLimited is order of tens of minutes. I keeps us from having to worry about permafails


func (c *Controller) sync(key queueKey) error {
// sync all ingresses in the namespace
if len(key.name) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this when you enqueue? Self feeding like this is unusual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess no reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember, because it collapses all namespace retries for these entries. Instead of each change traversing the cache, rapid changes result in de-duping all changes to the namespace into one, which then gets queued. Both secrets and services can go through burst creation. So instead of queuing everything in each namespace multiple times, we queue the namespace key. I think this is a better pattern when the controller has to react to multiple blanket events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could clean the terminology up, but it makes the tests a lot more centralized


ingress, err := c.ingressLister.Ingresses(key.namespace).Get(key.name)
if err != nil {
if errors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this above the err check to reduce nesting

processRouteFn := func(obj interface{}) {
switch t := obj.(type) {
case *routev1.Route:
name, ok := hasIngressOwnerRef(t.OwnerReferences)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingressName please. My eyes skipped past this detail

}

if len(errs) > 1 {
return fmt.Errorf("multiple errors occurred while creating the ingress: %v", errs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewAggregate?

@deads2k
Copy link
Contributor

deads2k commented Mar 1, 2018

@smarterclayton I didn't go down into the details, but this does read like a pretty simple controller. I would queue directly instead of sometimes queuing an invalid key that you expand in your sync func though.

@smarterclayton
Copy link
Contributor Author

/test gcp

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Mar 4, 2018

There appears to be a race condition in the router storage where a rapidly updated route is being seen as conflicting with itself (it could be the controller executing multiple writes but I don't see any evidence of that happening).

That led me to identify an inconsistency in how we handle rejections inside the router - we should be passing down to our nested admission chain delete events when we reject an input, even if we may not have admitted it before (can leave old routes in the router state). Will continue to test, but highlights that we need to have a consistency test for the router before we exit 3.10.

@smarterclayton
Copy link
Contributor Author

/retest

1 similar comment
@smarterclayton
Copy link
Contributor Author

/retest

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 21, 2018
@smarterclayton
Copy link
Contributor Author

/retest

@knobunc
Copy link
Contributor

knobunc commented Mar 22, 2018

@openshift/networking PTAL. This looks reasonable to me, but I would really like more reviewers.

@smarterclayton
Copy link
Contributor Author

We don't support the router value defaulting on ingress correctly today, so I wanted to punt that to a follow up where we add a boolean on route spec like useDefaultHost or something which lets the router decide, and if the router isn't enforcing a default hostname it would be served on the wildcard domain. That would make the router then compatible with ingress, although I think we need to figure out how to access control that.

@smarterclayton
Copy link
Contributor Author

/retest

1 similar comment
@smarterclayton
Copy link
Contributor Author

/retest

@Miciah
Copy link
Contributor

Miciah commented Mar 27, 2018

FWIW, I tested this in a devenv (oc cluster up and then oc create -f /data/src/github.com/openshift/origin/test/extended/testdata/ingress.yaml) and am seeing that raciness you mention. For example, the ingress-to-route controller created two routes, "test-52lll" and "test-9kgxp", for host 2.ingress-test.com; "test-52lll" got admitted, "test-9kgxp" got denied ("HostAlreadyClaimed"), then "test-52lll" got deleted, and 10 minutes later (on the next router sync), "test-9kgxp" finally got admitted.

@@ -296,6 +297,18 @@ func init() {
},
})

// ingress-secretref-controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment.

@@ -56,6 +56,10 @@ func (p *ExtendedValidator) HandleRoute(eventType watch.EventType, route *routea
if ok && reflect.DeepEqual(old.Spec, route.Spec) {
// Route spec was unchanged and it is already marked in
// error, we don't need to do anything more.
p.plugin.HandleRoute(watch.Deleted, route)
if eventType == watch.Deleted {
delete(p.invalidRoutes, routeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this was a fix for a memory leak, but then I realized that nothing appears to add entries to invalidRoutes; is the outer if block ever executed? (Perhaps @ramr can clarify.)

kind: List
apiVersion: v1
items:
# an ingress that should be captured as three routes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/three/four/

}
for name, stdout := range values {
stdout = strings.TrimSuffix(stdout, "\n")
e2e.Logf(name + ": " + strings.Join(strings.Split(stdout, "\n"), fmt.Sprintf("\n%s: ", name)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safer to use e2e.Logf("%s", name + ": " + strings.Join(strings.Split(stdout, "\n"), fmt.Sprintf("\n%s: ", name))).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how it's safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the off chance there's (e.g.) a %s in the output. Granted the worst case is probably that you get some spurious "%!s(MISSING)" in the log output, but why wouldn't you just say what you mean?

case metav1.Object:
ns := t.GetNamespace()
if len(ns) == 0 {
utilruntime.HandleError(fmt.Errorf("object %T has no namespace", obj))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why %T (type) and not %+v or %#v?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All we care about here is the type of object that we don't recognize.

r.Spec.To.Weight = existing.Spec.To.Weight
if r.Spec.TLS != nil && existing.Spec.TLS != nil {
r.Spec.TLS.CACertificate = existing.Spec.TLS.CACertificate
r.Spec.TLS.DestinationCACertificate = existing.Spec.TLS.DestinationCACertificate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to preserve existing.Spec.TLS.DestinationCACertificate without preserving existing.Spec.TLS.Termination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't set the destination in an ingress, so this was really just a sop left to allow someone to still get that feature. We could add the more complex logic to reset based on type.

return tlsConfig == nil
}
if secret.Type != v1.SecretTypeTLS {
return tlsConfig == nil
Copy link
Contributor

@Miciah Miciah Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic does not match that of newRouteForIngress, which does not return a route for an ingress that references a secret that has the wrong type. Shouldn't you just return false here?

If there is no reason for the logic to differ, you could factor the common logic out of newRouteForIngress:

func routeTLSConfigForSecret(secret *v1.Secret) *routev1.TLSConfig {
	if secret.Type != v1.SecretTypeTLS {
		return nil
	}
	if _, ok := secret.Data[v1.TLSCertKey]; !ok {
		return nil
	}
	if _, ok := secret.Data[v1.TLSPrivateKeyKey]; !ok {
		return nil
	}
	return &routev1.TLSConfig{
		Termination: routev1.TLSTerminationEdge,
		Certificate: string(secret.Data[v1.TLSCertKey]),
		Key:         string(secret.Data[v1.TLSPrivateKeyKey]),
		InsecureEdgeTerminationPolicy: routev1.InsecureEdgeTerminationPolicyRedirect,
	}
}

Then newRouteForIngress would use the new function as follows:

	var tlsConfig *routev1.TLSConfig
	if name, ok := referencesSecret(ingress, rule.Host); ok {
		secret, err := secretLister.Secrets(ingress.Namespace).Get(name)
		if err != nil {
			// Secret doesn't exist yet; wait.
			return nil
		}

		tlsConfig = routeTLSConfigForSecret(secret)
		if tlsConfig == nil {
			return nil
		}
	}

secretMatchesRoute would use the new function as follows:

	if secret == nil {
		return tlsConfig == nil
	}

	expectTLSConfig := routeTLSConfigForSecret(secret)
	if expectTLSConfig == nil {
		return false
	}

	if tlsConfig == nil {
		return false
	}

	return tlsConfig.Termination == expectTLSConfig.Termination &&
		tlsConfig.Certificate == expectTLSConfig.Certificate &&
		tlsConfig.Key == expectTLSConfig.Key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid allocating in the common case. This controller gets woken up on any secret change in a namespace which is expected to be common, therefore we have to be very careful not to over allocate during evaluation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. Is it common that

  • an ingress exists,
  • a corresponding route exists,
  • the ingress references a secret,
  • and that secret exists,
  • but the secret is the wrong type?

If this is a case worth optimizing, would it also make sense to return true in the case that the secret does not exist (i.e., return true from routeMatchesIngress if secretLister.Secrets(ingress.Namespace).Get(name) is nil)? Why treat secret-does-not-exist differently from secret-exists-but-has-the-wrong-type (which probably means the secret does not exist, and another secret is using its name)?

Copy link
Contributor Author

@smarterclayton smarterclayton Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any change to a namespace for a secret will cause us to need to reevaluate every ingress in the namespace. This is a level driven controller, so we will run over this code path for every ingress

  1. everytime a secret changes in a namespace
  2. everytime a service changes in a namespace
  3. everytime a resync interval hits

So it does matter to be as efficient as possible because this line of code will be invoked quite often.

The reason we return false is because the method does what it says on the tin - you can't reference a non TLS secret type from an ingress, so returning false means "doesn't match".

// 2. For every TLS hostname that has a corresponding path rule and points to a secret
// that exists, a route should exist with a valid TLS config from that secret.
// 3. For every service referenced by the ingress path rule, the route should have
// an update to date target port based on the service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"up to date"?

// 3. For every service referenced by the ingress path rule, the route should have
// an update to date target port based on the service.
// 4. A route owned by an ingress that no longer satisfies the first three invariants
// should be deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the wording of invariant 4 is quite right. When an ingress has two rules where on rule satisfies all requirements (nonempty host, nonempty backend, no TLS or all requisite secrets, valid service reference) and the other rule fails some requirements, the controller will create and keep the route for the latter rule.

Maybe, "For every route owned by an ingress, if that route and its corresponding path rule no longer satisfy the conditions in the first three invariants, the route should be deleted"?

// controller creates a controller owner reference from the route to the parent ingress,
// allowing users to orphan their ingress. All owned routes have specific spec fields
// managed (those attributes present on the ingress), while any other fields may be
// modified by the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "managed" mean "set" (to the corresponding ingress attributes), or something else?

Since you mention that the user may modify fields, it may warrant mentioning that the only certain fields (namely name, weight, CA and destination CA certificates, and insecure edge termination policy) are preserved should the ingress be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Managed" means "overwritten by the controller whenever the ingress changes".

@smarterclayton smarterclayton force-pushed the ingress_to_route branch 2 times, most recently from 1a87746 to 7ca5f80 Compare March 30, 2018 19:52
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Mar 30, 2018

Updated with comments and a fix for duplicate creation. I reproduced the issue on the server side that caused multiple route creations and deletions to be recorded and fixed that in the last commit (like other controllers that generate resources that don't have fixed names, we needed to use an expectation to avoid racing with the caches).

There is indeed a real problem in the router that it looks like a stress test should be able to reproduce. I suspect it is due to improperly clearing an underlying cache entry for duplicate names. Going to spawn a separate issue for that and ensure that whatever stress infrastructure Ram and I create that it correctly finds that issue.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 30, 2018

@smarterclayton: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_networking_minimal 4ee1dc4 link /test extended_networking_minimal

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@smarterclayton
Copy link
Contributor Author

/test extended_conformance_k8s

@smarterclayton
Copy link
Contributor Author

After looking at unique_host, it looks like route claiming is broken when paths are involved. I'm going to open a separate PR that fixes the issues. At heart, when a route is deleted other valid routes aren't being restored. This looks like a bug that we've had for a very long time.

@smarterclayton
Copy link
Contributor Author

Fixed the unique_host issue in #19175.

The router itself can't watch secrets, which means any changes to
ingress may take a long time to propogate. The ingress-to-route
controller replaces the need for the router to watch secrets or ingress.

During a 3.9 to 3.10 upgrade, the router will continue to show old
ingress until a rolling update happens, at which time it should begin
serving hosts and paths from the newer routes created by the controller.
A controller maintains the invariant that for every valid ingress there
exists zero or more routes that represent the ingress faithfully. The
controller watches ingresses, routes, and secrets referenced by the
ingress, and synthesizes an ingress equivalent. Features not supported
by the router are ignored by the controller.
Any admission plugin that can vary its output due to ordering could
leave nested plugins in an inconsistent state. Whenever we exit from a
plugin early *except* in the status admitter we should propogate a
delete down to the lower plugin to ensure that the route is truly
removed.

Also prevent unbounded items in the extended validator map.

Observed that when transitioning routes from admitted to rejected or
vice versa that some routes were being preserved in the lower levels.
Create a new test harness to simplify sending URL requests to running
routers.
When a controller creates multiple routes from a single ingress it may
observe arbitrary delays between the time of creation and when it
observes that route in its cache. Like other controllers that use random
name generation (secret controller, replica set controller) we therefore
need to wait until our expectation is met. To simplify the code, we
bypass server side name generation and use our own name generation to
avoid racing against the server (create route, server sends event, set
expectation = expectation is never cleared).
@knobunc
Copy link
Contributor

knobunc commented Apr 3, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 3, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: knobunc, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. sig/networking size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants