Skip to content

Commit

Permalink
Merge pull request #310 from weaveworks/canary-promotion
Browse files Browse the repository at this point in the history
Canary promotion improvements
  • Loading branch information
stefanprodan authored Sep 24, 2019
2 parents 9845578 + 2ff86fa commit 9df6bfb
Show file tree
Hide file tree
Showing 8 changed files with 140 additions and 93 deletions.
1 change: 1 addition & 0 deletions artifacts/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ spec:
- Initialized
- Waiting
- Progressing
- Promoting
- Finalising
- Succeeded
- Failed
Expand Down
1 change: 1 addition & 0 deletions charts/flagger/templates/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ spec:
- Initialized
- Waiting
- Progressing
- Promoting
- Finalising
- Succeeded
- Failed
Expand Down
36 changes: 30 additions & 6 deletions docs/gitbook/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ status:
```

The `Promoted` status condition can have one of the following reasons:
Initialized, Waiting, Progressing, Finalising, Succeeded or Failed.
Initialized, Waiting, Progressing, Promoting, Finalising, Succeeded or Failed.
A failed canary will have the promoted status set to `false`,
the reason to `failed` and the last applied spec will be different to the last promoted one.

Expand All @@ -153,6 +153,26 @@ Wait for a successful rollout:
kubectl wait canary/podinfo --for=condition=promoted
```

CI example:

```bash
# update the container image
kubectl set image deployment/podinfo podinfod=stefanprodan/podinfo:3.0.1
# wait for Flagger to detect the change
ok=false
until ${ok}; do
kubectl get canary/podinfo | grep 'Progressing' && ok=true || ok=false
sleep 5
done
# wait for the canary analysis to finish
kubectl wait canary/podinfo --for=condition=promoted --timeout=5m
# check if the deployment was successful
kubectl get canary/podinfo | grep Succeeded
```

### Istio routing

Flagger creates an Istio Virtual Service and Destination Rules based on the Canary service spec.
Expand Down Expand Up @@ -344,12 +364,13 @@ A canary deployment is triggered by changes in any of the following objects:
Gated canary promotion stages:

* scan for canary deployments
* check Istio virtual service routes are mapped to primary and canary ClusterIP services
* check primary and canary deployments status
* check primary and canary deployment status
* halt advancement if a rolling update is underway
* halt advancement if pods are unhealthy
* call pre-rollout webhooks are check results
* halt advancement if any hook returned a non HTTP 2xx result
* call confirm-rollout webhooks and check results
* halt advancement if any hook returns a non HTTP 2xx result
* call pre-rollout webhooks and check results
* halt advancement if any hook returns a non HTTP 2xx result
* increment the failed checks counter
* increase canary traffic weight percentage from 0% to 5% (step weight)
* call rollout webhooks and check results
Expand All @@ -366,8 +387,11 @@ Gated canary promotion stages:
* halt advancement if any webhook call fails
* halt advancement while canary request success rate is under the threshold
* halt advancement while canary request duration P99 is over the threshold
* halt advancement while any custom metric check fails
* halt advancement if the primary or canary deployment becomes unhealthy
* halt advancement while canary deployment is being scaled up/down by HPA
* call confirm-promotion webhooks and check results
* halt advancement if any hook returns a non HTTP 2xx result
* promote canary to primary
* copy ConfigMaps and Secrets from canary to primary
* copy canary deployment spec template over primary
Expand All @@ -377,7 +401,7 @@ Gated canary promotion stages:
* scale to zero the canary deployment
* mark rollout as finished
* call post-rollout webhooks
* post the analysis result to Slack
* post the analysis result to Slack or MS Teams
* wait for the canary deployment to be updated and start over

### Canary Analysis
Expand Down
1 change: 1 addition & 0 deletions kustomize/base/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ spec:
- Initialized
- Waiting
- Progressing
- Promoting
- Finalising
- Succeeded
- Failed
Expand Down
4 changes: 3 additions & 1 deletion pkg/apis/flagger/v1alpha3/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,9 @@ const (
CanaryPhaseWaiting CanaryPhase = "Waiting"
// CanaryPhaseProgressing means the canary analysis is underway
CanaryPhaseProgressing CanaryPhase = "Progressing"
// CanaryPhaseProgressing means the canary analysis is finished and traffic has been routed back to primary
// CanaryPhasePromoting means the canary analysis is finished and the primary spec has been updated
CanaryPhasePromoting CanaryPhase = "Promoting"
// CanaryPhaseProgressing means the canary promotion is finished and traffic has been routed back to primary
CanaryPhaseFinalising CanaryPhase = "Finalising"
// CanaryPhaseSucceeded means the canary analysis has been successful
// and the canary deployment has been promoted
Expand Down
3 changes: 3 additions & 0 deletions pkg/canary/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ func (c *Deployer) MakeStatusConditions(canaryStatus flaggerv1.CanaryStatus,
case flaggerv1.CanaryPhaseProgressing:
status = corev1.ConditionUnknown
message = "New revision detected, starting canary analysis."
case flaggerv1.CanaryPhasePromoting:
status = corev1.ConditionUnknown
message = "Canary analysis completed, starting primary rolling update."
case flaggerv1.CanaryPhaseFinalising:
status = corev1.ConditionUnknown
message = "Canary analysis completed, routing all traffic to primary."
Expand Down
145 changes: 61 additions & 84 deletions pkg/controller/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,27 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
return
}

// scale canary to zero if analysis has succeeded
// route all traffic to primary if analysis has succeeded
if cd.Status.Phase == flaggerv1.CanaryPhasePromoting {
if provider != "kubernetes" {
c.recordEventInfof(cd, "Routing all traffic to primary")
if err := meshRouter.SetRoutes(cd, 100, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, 100, 0)
}

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

return
}

// scale canary to zero if promotion has finished
if cd.Status.Phase == flaggerv1.CanaryPhaseFinalising {
if err := c.deployer.Scale(cd, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
Expand Down Expand Up @@ -304,7 +324,7 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}
}

// canary fix routing: A/B testing
// strategy: A/B testing
if len(cd.Spec.CanaryAnalysis.Match) > 0 && cd.Spec.CanaryAnalysis.Iterations > 0 {
// route traffic to canary and increment iterations
if cd.Spec.CanaryAnalysis.Iterations > cd.Status.Iterations {
Expand Down Expand Up @@ -336,38 +356,19 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
c.recordEventWarningf(cd, "%v", err)
return
}
// increment iterations
if err := c.deployer.SetStatusIterations(cd, cd.Status.Iterations+1); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
return
}

// route all traffic to primary
if cd.Spec.CanaryAnalysis.Iterations < cd.Status.Iterations {
primaryWeight = 100
canaryWeight = 0
if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, primaryWeight, canaryWeight)

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhasePromoting); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

c.recordEventInfof(cd, "Routing all traffic to primary")
return
}

return
}

// canary fix routing: B/G
// strategy: Blue/Green
if cd.Spec.CanaryAnalysis.Iterations > 0 {
// increment iterations
if cd.Spec.CanaryAnalysis.Iterations > cd.Status.Iterations {
Expand Down Expand Up @@ -405,105 +406,79 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}

// promote canary - max iterations reached
if cd.Spec.CanaryAnalysis.Iterations+1 == cd.Status.Iterations {
if cd.Spec.CanaryAnalysis.Iterations < cd.Status.Iterations {
c.recordEventInfof(cd, "Copying %s.%s template spec to %s.%s",
cd.Spec.TargetRef.Name, cd.Namespace, primaryName, cd.Namespace)
if err := c.deployer.Promote(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

// increment iterations
if err := c.deployer.SetStatusIterations(cd, cd.Status.Iterations+1); err != nil {
// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhasePromoting); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
return
}

// route all traffic to primary
if cd.Spec.CanaryAnalysis.Iterations < cd.Status.Iterations {
if provider != "kubernetes" {
c.recordEventInfof(cd, "Routing all traffic to primary")
if err := meshRouter.SetRoutes(cd, 100, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, 100, 0)
return
}

// strategy: Canary progressive traffic increase
if cd.Spec.CanaryAnalysis.StepWeight > 0 {
// increase traffic weight
if canaryWeight < maxWeight {
primaryWeight -= cd.Spec.CanaryAnalysis.StepWeight
if primaryWeight < 0 {
primaryWeight = 0
}
canaryWeight += cd.Spec.CanaryAnalysis.StepWeight
if canaryWeight > 100 {
canaryWeight = 100
}

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

return
}

return
}
if err := c.deployer.SetStatusWeight(cd, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

// canary incremental traffic weight
if canaryWeight < maxWeight {
primaryWeight -= cd.Spec.CanaryAnalysis.StepWeight
if primaryWeight < 0 {
primaryWeight = 0
}
canaryWeight += cd.Spec.CanaryAnalysis.StepWeight
if primaryWeight > 100 {
primaryWeight = 100
c.recorder.SetWeight(cd, primaryWeight, canaryWeight)
c.recordEventInfof(cd, "Advance %s.%s canary weight %v", cd.Name, cd.Namespace, canaryWeight)
return
}

// check promotion gate
// promote canary - max weight reached
if canaryWeight >= maxWeight {
// check promotion gate
if promote := c.runConfirmPromotionHooks(cd); !promote {
return
}
}

if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

// update weight status
if err := c.deployer.SetStatusWeight(cd, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

c.recorder.SetWeight(cd, primaryWeight, canaryWeight)
c.recordEventInfof(cd, "Advance %s.%s canary weight %v", cd.Name, cd.Namespace, canaryWeight)

// promote canary
if canaryWeight >= maxWeight {
// update primary spec
c.recordEventInfof(cd, "Copying %s.%s template spec to %s.%s",
cd.Spec.TargetRef.Name, cd.Namespace, primaryName, cd.Namespace)
if err := c.deployer.Promote(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
}
} else {
// route all traffic to primary
primaryWeight = 100
canaryWeight = 0
if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, primaryWeight, canaryWeight)

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
c.recordEventWarningf(cd, "%v", err)
// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhasePromoting); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

return
}

c.recordEventInfof(cd, "Routing all traffic to primary")
return
}

}

func (c *Controller) shouldSkipAnalysis(cd *flaggerv1.Canary, meshRouter router.Interface, primaryWeight int, canaryWeight int) bool {
Expand Down Expand Up @@ -555,6 +530,7 @@ func (c *Controller) shouldAdvance(cd *flaggerv1.Canary) (bool, error) {
cd.Status.Phase == flaggerv1.CanaryPhaseInitializing ||
cd.Status.Phase == flaggerv1.CanaryPhaseProgressing ||
cd.Status.Phase == flaggerv1.CanaryPhaseWaiting ||
cd.Status.Phase == flaggerv1.CanaryPhasePromoting ||
cd.Status.Phase == flaggerv1.CanaryPhaseFinalising {
return true, nil
}
Expand All @@ -579,6 +555,7 @@ func (c *Controller) shouldAdvance(cd *flaggerv1.Canary) (bool, error) {
func (c *Controller) checkCanaryStatus(cd *flaggerv1.Canary, shouldAdvance bool) bool {
c.recorder.SetStatus(cd, cd.Status.Phase)
if cd.Status.Phase == flaggerv1.CanaryPhaseProgressing ||
cd.Status.Phase == flaggerv1.CanaryPhasePromoting ||
cd.Status.Phase == flaggerv1.CanaryPhaseFinalising {
return true
}
Expand Down
Loading

0 comments on commit 9df6bfb

Please sign in to comment.