Skip to content

Commit

Permalink
fix(controller): Sticky session correction for AWS ALB. Fixes #1572 (#…
Browse files Browse the repository at this point in the history
…1577)

* fix: use patch to update workload-generation annotation

Signed-off-by: Alexander Matyushentsev <[email protected]>

* Add support for AWS ALB TargetGroupStickinessConfig

Adds support for AWS ALB [TargetGroupStickinessConfig](https://aws.amazon.com/blogs/aws/new-application-load-balancer-simplifies-deployment-with-weighted-target-groups/)

This is required to support sticky session on the listener level while Argo is using ALB's weighting

Signed-off-by: Sebastian J <[email protected]>

* Remove code smells

Signed-off-by: Sebastian J <[email protected]>

* more code smells

Signed-off-by: Sebastian J <[email protected]>

* PR feedback

Signed-off-by: Sebastian J <[email protected]>

* Revert

Signed-off-by: Sebastian J <[email protected]>

* Another test round

Signed-off-by: Sebastian J <[email protected]>

* Force codegen GO 1.16

Forced codegen via downgrading to Go 1.16:

```
$ env|grep GO

GOPATH=/Users/sebastian/go
```

```
$ go version

go version go1.16.10 darwin/amd64
```

```
$ echo $PATH

/Users/sebastian/.sdkman/candidates/micronaut/current/bin:/Users/sebastian/.sdkman/candidates/java/current/bin:/Users/sebastian/.cargo/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/usr/local/MacGPG2/bin:/usr/local/share/dotnet:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/bin:/Users/sebastian/go/bin
```

Signed-off-by: Sebastian J <[email protected]>

* Added documentaiton

Signed-off-by: Sebastian J <[email protected]>

Co-authored-by: Alexander Matyushentsev <[email protected]>
  • Loading branch information
derjust and alexmt authored Dec 14, 2021
1 parent 3c1e87c commit 5e188f9
Show file tree
Hide file tree
Showing 15 changed files with 1,698 additions and 546 deletions.
46 changes: 34 additions & 12 deletions docs/features/traffic-management/alb.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

## Overview

[AWS Load Balancer Controller](https://github.com/kubernetes-sigs/aws-load-balancer-controller)
[AWS Load Balancer Controller](https://github.com/kubernetes-sigs/aws-load-balancer-controller)
(also known as AWS ALB Ingress Controller) enables traffic management through an Ingress object,
which configures an AWS Application Load Balancer (ALB) to route traffic to one or more Kubernetes
services. ALBs provides advanced traffic splitting capability through the concept of
Expand All @@ -31,7 +31,7 @@ the desired traffic weights.

## Usage

To configure a Rollout to use the ALB integration and split traffic between the canary and stable
To configure a Rollout to use the ALB integration and split traffic between the canary and stable
services during updates, the Rollout should be configured with the following fields:

```yaml
Expand Down Expand Up @@ -83,9 +83,9 @@ spec:
During an update, the rollout controller injects the `alb.ingress.kubernetes.io/actions.<SERVICE-NAME>`
annotation, containing a JSON payload understood by the AWS Load Balancer Controller, directing it
to split traffic between the `canaryService` and `stableService` according to the current canary weight.
to split traffic between the `canaryService` and `stableService` according to the current canary weight.

The following is an example of our example Ingress after the rollout has injected the custom action
The following is an example of our example Ingress after the rollout has injected the custom action
annotation that splits traffic between the canary-service and stable-service, with a traffic weight
of 10 and 90 respectively:

Expand All @@ -97,16 +97,16 @@ metadata:
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/actions.root-service: |
{
{
"Type":"forward",
"ForwardConfig":{
"TargetGroups":[
{
"ForwardConfig":{
"TargetGroups":[
{
"Weight":10,
"ServiceName":"canary-service",
"ServicePort":"80"
},
{
{
"Weight":90,
"ServiceName":"stable-service",
"ServicePort":"80"
Expand Down Expand Up @@ -158,9 +158,31 @@ spec:
...
```

### Sticky session

Because at least two target groups (canary and stable) are used, target group stickiness requires additional configuration:
Sticky session must be activated on the target group via

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
...
trafficRouting:
alb:
stickinessConfig:
enabled: true
durationSeconds: 3600
...
```

More information can be found in the [AWS ALB API](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/sticky-sessions.html)

### Zero-Downtime Updates with AWS TargetGroup Verification

Argo Rollouts contains two features to help ensure zero-downtime updates when used with the AWS
Argo Rollouts contains two features to help ensure zero-downtime updates when used with the AWS
LoadBalancer controller: TargetGroup IP verification and TargetGroup weight verification. Both
features involve the Rollout controller performing additional safety checks to AWS, to verify
the changes made to the Ingress object are reflected in the underlying AWS TargetGroup.
Expand All @@ -185,7 +207,7 @@ errors when the TargetGroup points to pods which have already been scaled down.

To mitigate this risk, AWS recommends the use of
[pod readiness gate injection](https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/deploy/pod_readiness_gate/)
when running the AWS LoadBalancer in IP mode. Readiness gates allow for the AWS LoadBalancer
when running the AWS LoadBalancer in IP mode. Readiness gates allow for the AWS LoadBalancer
controller to verify that TargetGroups are accurate before marking newly created Pods as "ready",
preventing premature scale down of the older ReplicaSet.

Expand Down Expand Up @@ -218,7 +240,7 @@ downtime in the following problematic scenario during an update from V1 to V2:
5. V1 ReplicaSet is scaled down to complete the update

After step 5, when the V1 ReplicaSet is scaled down, the outdated TargetGroup would still be pointing
to the V1 Pods IPs which no longer exist, causing downtime.
to the V1 Pods IPs which no longer exist, causing downtime.

To allow for zero-downtime updates, Argo Rollouts has the ability to perform TargetGroup IP
verification as an additional safety measure during an update. When this feature is enabled, whenever
Expand Down
8 changes: 8 additions & 0 deletions ingress/alb.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,14 @@ func getResetALBActionStr(ingress *ingressutil.Ingress, action string) (string,
},
},
}

if previousAction.ForwardConfig.TargetGroupStickinessConfig != nil {
albAction.ForwardConfig.TargetGroupStickinessConfig = &ingressutil.ALBTargetGroupStickinessConfig{
Enabled: previousAction.ForwardConfig.TargetGroupStickinessConfig.Enabled,
DurationSeconds: previousAction.ForwardConfig.TargetGroupStickinessConfig.DurationSeconds,
}
}

bytes := jsonutil.MustMarshal(albAction)
return string(bytes), nil
}
62 changes: 55 additions & 7 deletions ingress/alb_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,39 @@ const actionTemplate = `{
}
}`

const actionTemplateWithStickyConfig = `{
"Type":"forward",
"ForwardConfig":{
"TargetGroups":[
{
"ServiceName":"%s",
"ServicePort":"%d",
"Weight": 85
},{
"ServiceName":"%s",
"ServicePort":"%d",
"Weight": 15
}
],
"TargetGroupStickinessConfig":{
"DurationSeconds" : 300,
"Enabled" : true
}
}
}`

func albActionAnnotation(stable string) string {
return fmt.Sprintf("%s%s%s", ingressutil.ALBIngressAnnotation, ingressutil.ALBActionPrefix, stable)
}

func newALBIngress(name string, port int, serviceName string, rollout string) *extensionsv1beta1.Ingress {
func newALBIngress(name string, port int, serviceName string, rollout string, includeStickyConfig bool) *extensionsv1beta1.Ingress {
canaryService := fmt.Sprintf("%s-canary", serviceName)
albActionKey := albActionAnnotation(serviceName)
managedBy := fmt.Sprintf("%s:%s", rollout, albActionKey)
action := fmt.Sprintf(actionTemplate, serviceName, port, canaryService, port)
if includeStickyConfig {
action = fmt.Sprintf(actionTemplateWithStickyConfig, serviceName, port, canaryService, port)
}
return &extensionsv1beta1.Ingress{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Expand Down Expand Up @@ -98,7 +122,7 @@ func rollout(name, service, ingress string) *v1alpha1.Rollout {

func TestInvalidManagedALBActions(t *testing.T) {
rollout := rollout("rollout", "stable-service", "test-ingress")
ing := newALBIngress("test-ingress", 80, "stable-service", rollout.Name)
ing := newALBIngress("test-ingress", 80, "stable-service", rollout.Name, false)
ing.Annotations[ingressutil.ManagedActionsAnnotation] = "invalid-managed-by"

ctrl, kubeclient, enqueuedObjects := newFakeIngressController(t, ing, rollout)
Expand All @@ -110,7 +134,7 @@ func TestInvalidManagedALBActions(t *testing.T) {
}

func TestInvalidPreviousALBActionAnnotationValue(t *testing.T) {
ing := newALBIngress("test-ingress", 80, "stable-service", "not-existing-rollout")
ing := newALBIngress("test-ingress", 80, "stable-service", "not-existing-rollout", false)
ing.Annotations[albActionAnnotation("stable-service")] = "{"

ctrl, kubeclient, enqueuedObjects := newFakeIngressController(t, ing, nil)
Expand All @@ -122,7 +146,7 @@ func TestInvalidPreviousALBActionAnnotationValue(t *testing.T) {
}

func TestInvalidPreviousALBActionAnnotationKey(t *testing.T) {
ing := newALBIngress("test-ingress", 80, "stable-service", "not-existing-rollout")
ing := newALBIngress("test-ingress", 80, "stable-service", "also-not-existing-rollout", false)
ing.Annotations[ingressutil.ManagedActionsAnnotation] = "invalid-action-key"
ctrl, kubeclient, enqueuedObjects := newFakeIngressController(t, ing, nil)

Expand All @@ -133,7 +157,7 @@ func TestInvalidPreviousALBActionAnnotationKey(t *testing.T) {
}

func TestResetActionFailureFindNoPort(t *testing.T) {
ing := newALBIngress("test-ingress", 80, "stable-service", "not-existing-rollout")
ing := newALBIngress("test-ingress", 80, "stable-service", "still-not-existing-rollout", false)
ing.Annotations[albActionAnnotation("stable-service")] = "{}"

ctrl, kubeclient, enqueuedObjects := newFakeIngressController(t, ing, nil)
Expand All @@ -146,7 +170,7 @@ func TestResetActionFailureFindNoPort(t *testing.T) {

func TestALBIngressNoModifications(t *testing.T) {
rollout := rollout("rollout", "stable-service", "test-ingress")
ing := newALBIngress("test-ingress", 80, "stable-service", rollout.Name)
ing := newALBIngress("test-ingress", 80, "stable-service", rollout.Name, false)

ctrl, kubeclient, enqueuedObjects := newFakeIngressController(t, ing, rollout)

Expand All @@ -157,7 +181,7 @@ func TestALBIngressNoModifications(t *testing.T) {
}

func TestALBIngressResetAction(t *testing.T) {
ing := newALBIngress("test-ingress", 80, "stable-service", "non-existing-rollout")
ing := newALBIngress("test-ingress", 80, "stable-service", "non-existing-rollout", false)

ctrl, kubeclient, enqueuedObjects := newFakeIngressController(t, ing, nil)
err := ctrl.syncIngress("default/test-ingress")
Expand All @@ -179,3 +203,27 @@ func TestALBIngressResetAction(t *testing.T) {
expectedAction := `{"Type":"forward","ForwardConfig":{"TargetGroups":[{"ServiceName":"stable-service","ServicePort":"80","Weight":100}]}}`
assert.Equal(t, expectedAction, annotations[albActionAnnotation("stable-service")])
}

func TestALBIngressResetActionWithStickyConfig(t *testing.T) {
ing := newALBIngress("test-ingress", 80, "stable-service", "non-existing-rollout", true)

ctrl, kubeclient, enqueuedObjects := newFakeIngressController(t, ing, nil)
err := ctrl.syncIngress("default/test-ingress")
assert.Nil(t, err)
assert.Len(t, enqueuedObjects, 0)
actions := kubeclient.Actions()
assert.Len(t, actions, 1)
updateAction, ok := actions[0].(k8stesting.UpdateAction)
if !ok {
assert.Fail(t, "Client call was not an update")
updateAction.GetObject()
}
acc, err := meta.Accessor(updateAction.GetObject())
if err != nil {
panic(err)
}
annotations := acc.GetAnnotations()
assert.NotContains(t, annotations, ingressutil.ManagedActionsAnnotation)
expectedAction := `{"Type":"forward","ForwardConfig":{"TargetGroups":[{"ServiceName":"stable-service","ServicePort":"80","Weight":100}],"TargetGroupStickinessConfig":{"Enabled":true,"DurationSeconds":300}}}`
assert.Equal(t, expectedAction, annotations[albActionAnnotation("stable-service")])
}
11 changes: 11 additions & 0 deletions manifests/crds/rollout-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,17 @@ spec:
servicePort:
format: int32
type: integer
stickinessConfig:
properties:
durationSeconds:
format: int64
type: integer
enabled:
type: boolean
required:
- durationSeconds
- enabled
type: object
required:
- ingress
- servicePort
Expand Down
11 changes: 11 additions & 0 deletions manifests/install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11012,6 +11012,17 @@ spec:
servicePort:
format: int32
type: integer
stickinessConfig:
properties:
durationSeconds:
format: int64
type: integer
enabled:
type: boolean
required:
- durationSeconds
- enabled
type: object
required:
- ingress
- servicePort
Expand Down
11 changes: 11 additions & 0 deletions manifests/namespace-install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11012,6 +11012,17 @@ spec:
servicePort:
format: int32
type: integer
stickinessConfig:
properties:
durationSeconds:
format: int64
type: integer
enabled:
type: boolean
required:
- durationSeconds
- enabled
type: object
required:
- ingress
- servicePort
Expand Down
16 changes: 16 additions & 0 deletions pkg/apiclient/rollout/rollout.swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,10 @@
"type": "string",
"title": "RootService references the service in the ingress to the controller should add the action to"
},
"stickinessConfig": {
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.StickinessConfig",
"title": "AdditionalForwardConfig allows to specify further settings on the ForwaredConfig\n+optional"
},
"annotationPrefix": {
"type": "string",
"title": "AnnotationPrefix has to match the configured annotation prefix on the alb ingress controller\n+optional"
Expand Down Expand Up @@ -1455,6 +1459,18 @@
},
"title": "SetCanaryScale defines how to scale the newRS without changing traffic weight"
},
"github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.StickinessConfig": {
"type": "object",
"properties": {
"enabled": {
"type": "boolean"
},
"durationSeconds": {
"type": "string",
"format": "int64"
}
}
},
"github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.TLSRoute": {
"type": "object",
"properties": {
Expand Down
Loading

0 comments on commit 5e188f9

Please sign in to comment.