Add option to ensure High Availability for TrafficRouted canary #1738

ssanders1449 · 2022-01-02T16:07:55Z

Summary

When using TrafficRouted canary strategy, you can often end up with only a single pod for a replicaset. For example, if you have 10 pods with <= 10% canary, the canary replicaset will only have a single pod. If the node that the pod is running on is ever drained (e.g. by the cluster-autoscaler doing a scale-down), there would be an interruption of service

What change needs making?
Adding a parameter to the TrafficRouted canary strategy specifying the minimum number of pod per ReplicaSet to use - even if the percentage calculation would otherwise require fewer pods (assuming the desired number of pods in this ReplicaSet is not zero). Using this in conjunction with Pod Disruption Budgets, can assure that there will be no outages

When would you use this?
Whenever you are using Traffic routed canary, but also require high availability.

See https://github.com/ssanders1449/argo-rollouts/pull/1/files for a proposed implementation

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

harikrongali · 2022-01-03T19:14:47Z

@ssanders1449 Rollout has a feature setCanaryScale that can be used to achieve what you are looking for.
https://argoproj.github.io/argo-rollouts/features/specification/

      # set canary scale to a explicit count without changing traffic weight
      # (supported only with trafficRouting)
      - setCanaryScale:
          replicas: 3

      # set canary scale to a percentage of spec.replicas without changing traffic weight
      # (supported only with trafficRouting)
      - setCanaryScale:
          weight: 25

jessesuen · 2022-01-15T00:13:30Z

setCanaryScale was built for this purpose. We need to avoid adding feature complexity to the current calculations for canary vs. stable replica counts so don't think we should introduce a new feature that setCanaryScale should already be able to address.

ssanders1449 · 2022-07-27T06:56:13Z

I agree that this should be addressed by setCanaryScale, but please see #1779 where I show that we need to add a 'minReplicas' property to setCanaryScale

ssanders1449 added the enhancement New feature or request label Jan 2, 2022

harikrongali added the answered label Jan 3, 2022

jessesuen closed this as completed Jan 15, 2022

ssanders1449 mentioned this issue Jan 16, 2022

Provide a way to specify a minimum number of pods per Replicaset #1779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to ensure High Availability for TrafficRouted canary #1738

Add option to ensure High Availability for TrafficRouted canary #1738

ssanders1449 commented Jan 2, 2022 •

edited

Loading

harikrongali commented Jan 3, 2022

jessesuen commented Jan 15, 2022

ssanders1449 commented Jul 27, 2022

Add option to ensure High Availability for TrafficRouted canary #1738

Add option to ensure High Availability for TrafficRouted canary #1738

Comments

ssanders1449 commented Jan 2, 2022 • edited Loading

Summary

See https://github.com/ssanders1449/argo-rollouts/pull/1/files for a proposed implementation

harikrongali commented Jan 3, 2022

jessesuen commented Jan 15, 2022

ssanders1449 commented Jul 27, 2022

ssanders1449 commented Jan 2, 2022 •

edited

Loading