-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to specify a minimum number of pods per Replicaset #1779
Comments
…aryScale Signed-off-by: Shlomo Sanders <[email protected]>
This issue is stale because it has been open 60 days with no activity. |
@zachaller so I see the reason not to add complexity to the controller and the Rollout spec in #1738 (comment) Should a global flag (--canary-min-replicas) affecting all Rollouts be an acceptable compromise here? |
Could use a Pod Disruption Budget to do the exact same thing you should be able to use the spec.minAvailable option based on https://kubernetes.io/docs/tasks/run-application/configure-pdb/#arbitrary-controllers-and-selectors fyi, not 100% sure it covers the use case but asking if it would work or not? |
@ssanders1449 any updates if you could use PDB to accomplish the same thing or not? |
@zachaller as far as I can see, setting a Pod Disruption Budget on a Rollout would count all of the pods in the Rollout (both stable and canary) when making that calculation - with no distinction between them. So, it really can't do what we want here. This is actually one of the enhancements that I would like to see for argo-rollouts. Instead of defining a PDB separately from the Rollout (which will operate on all pods equally) we should add a 'pdb' section to the Rollout definition and have argo create/delete a PDB with those parameters whenever it creates/deletes a ReplicaSet. In other words, we will have a separate PDB per Replicaset. This will allow us to specify, for example a PDB with 'maxUnavailable' percentage per Replicaset. Even if we had 'PDB per ReplicaSet' support, I would still say that we need 'min pods per Replicaset' support, because when using maxUnavailable in a PDB it will allow eviction of a single pod. |
I created a new issue for adding support for a PDB per ReplicaSet (#2419), but as I said above, won't solve this issue if the PDB is defined using 'maxUnavailable' |
So I also was able to spend a little bit more time looking into this and this is a slight summary. With rollouts ability to adjust labels on individual pods via:
I am pretty sure would allow you to use PDB on the canary only however the limitation will then apply that you can only use |
It seems that the premise about the PDB limitation was incorrect. We just performed a test on a Rollout where we defined a PDB with 'maxUnavailable: 10% (which is what we used for most of our pre-Rollout deployments), and it worked perfectly. Also, using the labels defined by canaryMetadata and stableMetadata we were also able to create 2 separate PDBs with 'maxUnavailable: 10%' and they both worked properly. So, I closed #2419. We could do your proposed workaround using 'minAvailable: 50%'. It will protect a single pod while allowing evictions when there is more than one pod. We cannot use a percentage higher than 50% because otherwise it won't allow evictions even when there are two pods which can prevent cluster scaledown and cost money. However, if we use 'minAvailable: 50% and have many pods, this can reduce capacity by as much as 50% when draining a lot of nodes - which might be a hard sell to our Devops teams. Therefore, we prefer to use maxUnavailable 10% because that scales the 'disruptionsAllowed' field in the PDB automatically as the #pods scale up, while minimizing the impact on SLA when a lot of nodes are drained at once. The downside of 'maxUnavailable: 10%' is that it only means anything when there are at least 2 pods. It allows eviction when there is only a single pod. This is why we originally came up with the requirement of having a way to specify a minimum #pods per replicaSet. We have three proposed implementations: |
Thank you for the great write up on the reasoning, it is also good to know that k8s docs about the PDB limitations do not apply here I am guessing its because the parent is still a replicaset. Also thanks for the 3 implementations give me just a bit to think think about what one makes the most sense. |
@ssanders1449 I think going with your option 1 is something we would be possibly interested in. If you could open a PR for that against rollouts so that we can run e2e etc etc. |
…PerReplicaSet (#2448) * add rollout.Spec.Strategy.Canary.MinPodsPerRS (for TrafficRoutedCanary only) Signed-off-by: Shlomo Sanders <[email protected]> * fix lint and test Signed-off-by: Shlomo Sanders <[email protected]> * fix codegen Signed-off-by: Shlomo Sanders <[email protected]> * rename MinPodsPerRS to MinPodsPerReplicaSet Signed-off-by: Shlomo Sanders <[email protected]> * fix lint and codegen Signed-off-by: Shlomo Sanders <[email protected]> * update specification.md Signed-off-by: Shlomo Sanders <[email protected]> * codegen Signed-off-by: zachaller <[email protected]> * codegen missed a file Signed-off-by: zachaller <[email protected]> Signed-off-by: Shlomo Sanders <[email protected]> Signed-off-by: zachaller <[email protected]> Co-authored-by: Shlomo Sanders <[email protected]> Co-authored-by: zachaller <[email protected]>
….MinPodsPerReplicaSet (argoproj#2448) * add rollout.Spec.Strategy.Canary.MinPodsPerRS (for TrafficRoutedCanary only) Signed-off-by: Shlomo Sanders <[email protected]> * fix lint and test Signed-off-by: Shlomo Sanders <[email protected]> * fix codegen Signed-off-by: Shlomo Sanders <[email protected]> * rename MinPodsPerRS to MinPodsPerReplicaSet Signed-off-by: Shlomo Sanders <[email protected]> * fix lint and codegen Signed-off-by: Shlomo Sanders <[email protected]> * update specification.md Signed-off-by: Shlomo Sanders <[email protected]> * codegen Signed-off-by: zachaller <[email protected]> * codegen missed a file Signed-off-by: zachaller <[email protected]> Signed-off-by: Shlomo Sanders <[email protected]> Signed-off-by: zachaller <[email protected]> Co-authored-by: Shlomo Sanders <[email protected]> Co-authored-by: zachaller <[email protected]>
….MinPodsPerReplicaSet (argoproj#2448) * add rollout.Spec.Strategy.Canary.MinPodsPerRS (for TrafficRoutedCanary only) Signed-off-by: Shlomo Sanders <[email protected]> * fix lint and test Signed-off-by: Shlomo Sanders <[email protected]> * fix codegen Signed-off-by: Shlomo Sanders <[email protected]> * rename MinPodsPerRS to MinPodsPerReplicaSet Signed-off-by: Shlomo Sanders <[email protected]> * fix lint and codegen Signed-off-by: Shlomo Sanders <[email protected]> * update specification.md Signed-off-by: Shlomo Sanders <[email protected]> * codegen Signed-off-by: zachaller <[email protected]> * codegen missed a file Signed-off-by: zachaller <[email protected]> Signed-off-by: Shlomo Sanders <[email protected]> Signed-off-by: zachaller <[email protected]> Co-authored-by: Shlomo Sanders <[email protected]> Co-authored-by: zachaller <[email protected]> Signed-off-by: Travis Perdue <[email protected]>
This issue is stale because it has been open 60 days with no activity. |
Summary
The issue that I was trying to solve with #1738 is when a horizontal pod autoscaler is controlling the #replicas in the rollout. I want the #canary pods to match the weight of the rollout, but I also want to ensure a minimum number pods (e.g. 2) to ensure high availability. Currently, I can only use setCanaryScale to set either a fixed number of pods or a percentage of the total #replicas I want to be able to set a percentage, but apply a minimum as well.
Also, if I am using Dynamic Stable Scale, I want a way to ensure that the stable replicaset never goes below a minimum (e.g. 2) number of pods to ensure high availability.
Use Cases
Assume a Horizontal Pod Autoscaler is controlling the #replicas in the rollout, with a min of 10 pods and a max of 2000. I do not know up-front how many pods will set in the rollout. So, lets say I am running a step with the canary weight of 5 percent. If the total number of pods is less than 20, that would only give me one canary pod and I will not have high availability. However, I can't use the 'replicas' option of setCanaryScale to set a fixed number of replicas, because if the HPA does scale up to 2000 pods, I want to have 100 canary pods.
A second, but related, use case is if I am using Dynamic Stable Scale, so the stable replicaset scales down when the canary scales up. If we have 4 replicas and the canary is at 50% we are OK (2 pods each for canary and stable). However, if we then go to 75% canary, we will be left with only one pod in the stable ReplicaSet and again lose high availability. Again, the total #replicas may be controlled by an HPA, so we don't know up front when creating this step how any replicas there will be,
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: