Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast-track rollback on Canary doesn't quite skip stages #2939

Closed
yohanb opened this issue Aug 7, 2023 · 2 comments · Fixed by #2953
Closed

fast-track rollback on Canary doesn't quite skip stages #2939

yohanb opened this issue Aug 7, 2023 · 2 comments · Fixed by #2953
Labels
bug Something isn't working

Comments

@yohanb
Copy link
Contributor

yohanb commented Aug 7, 2023

Describe the bug

When using a combination of scaleDownDelaySeconds and rollbackWindow to allow some time to perform a "fast-track" rollback, the Canary rollback still scales according to the steps/weights. Instead of just skipping the steps all together (expected behavior), it seems to perform the setWeight steps rapidly, without pausing.

To Reproduce

Here's a repo with the required app and steps to reproduce the bug.

Expected behavior

When performing a "fast-track" rollback, I would expected the n-1 revision to stay stable and not scale down/up.

Screenshots

Monosnap screencast 2023-08-04 16-20-57

Version

1.5.1

Logs

time="2023-08-07T19:52:56Z" level=info msg="Started syncing rollout" generation=11 namespace=canary resourceVersion=21841 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="ComputePodTemplateHash hash changed (expected: 7b68998789, actual: 6968596f4b)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No changes to canary ingress - skipping patch" ingress=canary-demo-canary-demo-canary namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Previous weights: &TrafficWeights{Canary:WeightDestination{Weight:0,ServiceName:canary-demo-preview,PodTemplateHash:6968596f4b,},Stable:WeightDestination{Weight:100,ServiceName:canary-demo-stable,PodTemplateHash:96d45bb9b,},Additional:[]WeightDestination{},Verified:nil,}" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="New weights: &TrafficWeights{Canary:WeightDestination{Weight:0,ServiceName:canary-demo-preview,PodTemplateHash:6968596f4b,},Stable:WeightDestination{Weight:100,ServiceName:canary-demo-stable,PodTemplateHash:6968596f4b,},Additional:[]WeightDestination{},Verified:nil,}" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Traffic weight updated " event_reason=TrafficWeightUpdated namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No StableRS exists to reconcile or matches newRS" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"canary\", Name:\"canary-demo\", UID:\"9c3ec8ac-8ba6-41c6-a15d-5b664c59df82\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"21841\", FieldPath:\"\"}): type: 'Normal' reason: 'TrafficWeightUpdated' Traffic weight updated "
time="2023-08-07T19:52:56Z" level=info msg="Enqueueing parent of canary/canary-demo-6968596f4b: Rollout canary/canary-demo"
time="2023-08-07T19:52:56Z" level=error msg="roCtx.reconcile err Operation cannot be fulfilled on replicasets.apps \"canary-demo-6968596f4b\": the object has been modified; please apply your changes to the latest version and try again" generation=11 namespace=canary resourceVersion=21841 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciliation completed" generation=11 namespace=canary resourceVersion=21841 rollout=canary-demo time_ms=5.372875
time="2023-08-07T19:52:56Z" level=error msg="rollout syncHandler error: Operation cannot be fulfilled on replicasets.apps \"canary-demo-6968596f4b\": the object has been modified; please apply your changes to the latest version and try again" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="rollout syncHandler queue retries: 5 : key \"canary/canary-demo\"" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=error msg="Operation cannot be fulfilled on replicasets.apps \"canary-demo-6968596f4b\": the object has been modified; please apply your changes to the latest version and try again\n" error="<nil>"
time="2023-08-07T19:52:56Z" level=info msg="Started syncing rollout" generation=11 namespace=canary resourceVersion=21841 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="ComputePodTemplateHash hash changed (expected: 7b68998789, actual: 6968596f4b)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No changes to canary ingress - skipping patch" ingress=canary-demo-canary-demo-canary namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Previous weights: &TrafficWeights{Canary:WeightDestination{Weight:0,ServiceName:canary-demo-preview,PodTemplateHash:6968596f4b,},Stable:WeightDestination{Weight:100,ServiceName:canary-demo-stable,PodTemplateHash:96d45bb9b,},Additional:[]WeightDestination{},Verified:nil,}" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="New weights: &TrafficWeights{Canary:WeightDestination{Weight:0,ServiceName:canary-demo-preview,PodTemplateHash:6968596f4b,},Stable:WeightDestination{Weight:100,ServiceName:canary-demo-stable,PodTemplateHash:6968596f4b,},Additional:[]WeightDestination{},Verified:nil,}" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Traffic weight updated " event_reason=TrafficWeightUpdated namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No StableRS exists to reconcile or matches newRS" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"canary\", Name:\"canary-demo\", UID:\"9c3ec8ac-8ba6-41c6-a15d-5b664c59df82\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"21841\", FieldPath:\"\"}): type: 'Normal' reason: 'TrafficWeightUpdated' Traffic weight updated "
time="2023-08-07T19:52:56Z" level=info msg="Enqueueing parent of canary/canary-demo-6968596f4b: Rollout canary/canary-demo"
time="2023-08-07T19:52:56Z" level=info msg="Scaled up ReplicaSet canary-demo-6968596f4b (revision 8) from 1 to 3" event_reason=ScalingReplicaSet namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Not finished reconciling new ReplicaSet 'canary-demo-6968596f4b'" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Not finished reconciling ReplicaSets" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"canary\", Name:\"canary-demo\", UID:\"9c3ec8ac-8ba6-41c6-a15d-5b664c59df82\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"21841\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up ReplicaSet canary-demo-6968596f4b (revision 8) from 1 to 3"
time="2023-08-07T19:52:56Z" level=info msg="Patched: {\"status\":{\"HPAReplicas\":4,\"availableReplicas\":4,\"canary\":{\"weights\":{\"stable\":{\"podTemplateHash\":\"6968596f4b\"}}},\"conditions\":[{\"lastTransitionTime\":\"2023-08-07T18:21:55Z\",\"lastUpdateTime\":\"2023-08-07T18:21:55Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2023-08-07T19:24:41Z\",\"lastUpdateTime\":\"2023-08-07T19:24:41Z\",\"message\":\"Rollout is not healthy\",\"reason\":\"RolloutHealthy\",\"status\":\"False\",\"type\":\"Healthy\"},{\"lastTransitionTime\":\"2023-08-07T19:52:38Z\",\"lastUpdateTime\":\"2023-08-07T19:52:38Z\",\"message\":\"Rollout is paused\",\"reason\":\"RolloutPaused\",\"status\":\"False\",\"type\":\"Paused\"},{\"lastTransitionTime\":\"2023-08-07T19:52:56Z\",\"lastUpdateTime\":\"2023-08-07T19:52:56Z\",\"message\":\"RolloutCompleted\",\"reason\":\"RolloutCompleted\",\"status\":\"True\",\"type\":\"Completed\"},{\"lastTransitionTime\":\"2023-08-07T19:52:38Z\",\"lastUpdateTime\":\"2023-08-07T19:52:56Z\",\"message\":\"ReplicaSet \\\"canary-demo-6968596f4b\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"message\":\"more replicas need to be updated\",\"phase\":\"Progressing\",\"readyReplicas\":4,\"replicas\":4,\"updatedReplicas\":1}}" generation=11 namespace=canary resourceVersion=21841 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="persisted to informer" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciliation completed" generation=11 namespace=canary resourceVersion=21841 rollout=canary-demo time_ms=22.182917
time="2023-08-07T19:52:56Z" level=info msg="Started syncing rollout" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="ComputePodTemplateHash hash changed (expected: 7b68998789, actual: 6968596f4b)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Start processing" resource=canary/canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Processing completed" resource=canary/canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No changes to canary ingress - skipping patch" ingress=canary-demo-canary-demo-canary namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No StableRS exists to reconcile or matches newRS" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling 1 old ReplicaSets (total pods: 3)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 3 available pods in old RS canary/canary-demo-96d45bb9b" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 4 available pods, scaling down old RSes (minAvailable: 3, maxScaleDown: 1)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Enqueueing parent of canary/canary-demo-6968596f4b: Rollout canary/canary-demo"
time="2023-08-07T19:52:56Z" level=info msg="Enqueueing parent of canary/canary-demo-96d45bb9b: Rollout canary/canary-demo"
time="2023-08-07T19:52:56Z" level=info msg="Set 'scale-down-deadline' annotation on 'canary-demo-96d45bb9b' to 2023-08-07T20:52:56Z (1h0m0s)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No Steps remain in the canary steps" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Timed out (false) [last progress check: 2023-08-07 19:52:56 +0000 UTC - now: 2023-08-07 19:52:56.442479917 +0000 UTC m=+5766.480583457]" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No status changes. Skipping patch" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciliation completed" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo time_ms=39.298375
time="2023-08-07T19:52:56Z" level=info msg="Started syncing rollout" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="ComputePodTemplateHash hash changed (expected: 7b68998789, actual: 6968596f4b)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Enqueueing parent of canary/canary-demo-6968596f4b: Rollout canary/canary-demo"
time="2023-08-07T19:52:56Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No changes to canary ingress - skipping patch" ingress=canary-demo-canary-demo-canary namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No StableRS exists to reconcile or matches newRS" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling 1 old ReplicaSets (total pods: 3)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 3 available pods in old RS canary/canary-demo-96d45bb9b" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 4 available pods, scaling down old RSes (minAvailable: 3, maxScaleDown: 1)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="RS 'canary-demo-96d45bb9b' has not reached the scaleDownTime" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No Steps remain in the canary steps" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Timed out (false) [last progress check: 2023-08-07 19:52:56 +0000 UTC - now: 2023-08-07 19:52:56.454623875 +0000 UTC m=+5766.492727374]" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No status changes. Skipping patch" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciliation completed" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo time_ms=10.680625
time="2023-08-07T19:52:56Z" level=info msg="Started syncing rollout" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="ComputePodTemplateHash hash changed (expected: 7b68998789, actual: 6968596f4b)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No changes to canary ingress - skipping patch" ingress=canary-demo-canary-demo-canary namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No StableRS exists to reconcile or matches newRS" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling 1 old ReplicaSets (total pods: 3)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 3 available pods in old RS canary/canary-demo-96d45bb9b" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 4 available pods, scaling down old RSes (minAvailable: 3, maxScaleDown: 1)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="RS 'canary-demo-96d45bb9b' has not reached the scaleDownTime" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No Steps remain in the canary steps" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Patched: {\"status\":{\"HPAReplicas\":6,\"message\":null,\"phase\":\"Healthy\",\"replicas\":6,\"updatedReplicas\":3}}" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="persisted to informer" generation=11 namespace=canary resourceVersion=21868 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciliation completed" generation=11 namespace=canary resourceVersion=21857 rollout=canary-demo time_ms=17.482167
time="2023-08-07T19:52:56Z" level=info msg="Start processing" resource=canary/canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Processing completed" resource=canary/canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Started syncing rollout" generation=11 namespace=canary resourceVersion=21868 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="ComputePodTemplateHash hash changed (expected: 7b68998789, actual: 6968596f4b)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No changes to canary ingress - skipping patch" ingress=canary-demo-canary-demo-canary namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No StableRS exists to reconcile or matches newRS" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciling 1 old ReplicaSets (total pods: 3)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 3 available pods in old RS canary/canary-demo-96d45bb9b" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Found 4 available pods, scaling down old RSes (minAvailable: 3, maxScaleDown: 1)" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="RS 'canary-demo-96d45bb9b' has not reached the scaleDownTime" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No Steps remain in the canary steps" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Timed out (false) [last progress check: 2023-08-07 19:52:56 +0000 UTC - now: 2023-08-07 19:52:56.478217625 +0000 UTC m=+5766.516321166]" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="No status changes. Skipping patch" generation=11 namespace=canary resourceVersion=21868 rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=canary rollout=canary-demo
time="2023-08-07T19:52:56Z" level=info msg="Reconciliation completed" generation=11 namespace=canary resourceVersion=21868 rollout=canary-demo time_ms=1.539459


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@yohanb yohanb added the bug Something isn't working label Aug 7, 2023
@amazingandyyy
Copy link
Contributor

We bumped into this issue as well, from the docs,

It is often undesirable to re-run analysis and steps for a rollout, when the desired behavior is to rollback as soon as possible.

BUT the bug is that it still doing all the step and trigger pods scale up all over again :/

amazingandyyy pushed a commit to yohanb/argo-rollouts that referenced this issue Aug 11, 2023
amazingandyyy pushed a commit to yohanb/argo-rollouts that referenced this issue Aug 11, 2023
amazingandyyy pushed a commit to yohanb/argo-rollouts that referenced this issue Aug 11, 2023
amazingandyyy pushed a commit to yohanb/argo-rollouts that referenced this issue Aug 11, 2023
amazingandyyy pushed a commit to yohanb/argo-rollouts that referenced this issue Aug 28, 2023
@zachaller
Copy link
Collaborator

Just to add a bit more context rollbackWindow does seem to work properly pre this PR, this PR fixes and edge case when you do a rollback within the scaleDownDelaySeconds period. A workaround would be to abort instead of rollback because your still in the "rollout"

zachaller added a commit that referenced this issue Aug 31, 2023
…ollbackWindow (#2953)

* fix(canary): skip steps when in rollback window and rs is still active

Resolve #2939

Signed-off-by: Andy Chen <[email protected]>

* test(canary): add case where rollback when in window and rs is still active

Signed-off-by: Andy Chen <[email protected]>

* test(replicaset): add test case for IsActive function

Signed-off-by: Andy Chen <[email protected]>

---------

Signed-off-by: Andy Chen <[email protected]>
Co-authored-by: Yohan Belval <[email protected]>
Co-authored-by: zachaller <[email protected]>
zachaller added a commit that referenced this issue Aug 31, 2023
…ollbackWindow (#2953)

* fix(canary): skip steps when in rollback window and rs is still active

Resolve #2939

Signed-off-by: Andy Chen <[email protected]>

* test(canary): add case where rollback when in window and rs is still active

Signed-off-by: Andy Chen <[email protected]>

* test(replicaset): add test case for IsActive function

Signed-off-by: Andy Chen <[email protected]>

---------

Signed-off-by: Andy Chen <[email protected]>
Co-authored-by: Yohan Belval <[email protected]>
Co-authored-by: zachaller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants