Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo rollouts will scale down stable when canary is missing pods #2050

Closed
MarkSRobinson opened this issue May 25, 2022 · 20 comments · Fixed by #2441
Closed

Argo rollouts will scale down stable when canary is missing pods #2050

MarkSRobinson opened this issue May 25, 2022 · 20 comments · Fixed by #2441
Labels
bug Something isn't working
Milestone

Comments

@MarkSRobinson
Copy link
Contributor

Summary

Argo scaled down the stable RS even while the new replica set is not fully available. In this case, it scaled from 110 pods down to 0 but didn't update the routing until the new RS was 100% ready.

  1. Argo shouldn't scale down the stable RS until it switches the to use the new canary as stable.
  2. Argo shouldn't scale down the stable RS until the canary RS is ready.

Diagnostics

What version of Argo Rollouts are you running?

1.2.0

time="2022-05-24T07:57:11Z" level=info msg="Previous weights: &TrafficWeights{Canary:WeightDestination{Weight:100,ServiceName:api-v2-canary,PodTemplateHash:5bb4f8b6b6,},Stable:WeightDestination{Weight:0,ServiceName:api-v2-stable,PodTemplateHash:779d9dfb64,},Additional:[]WeightDestination{},Verified:nil,}" namespace=cs-team rollout=api-v2
--
  |   | time="2022-05-24T07:57:11Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:11Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:11Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:11Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:11Z" level=info msg="Set 'scale-down-deadline' annotation on 'api-v2-779d9dfb64' to 2022-05-24T07:57:41Z (30s)" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:14Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:14Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:14Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:33Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:33Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:33Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:41Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:41Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:41Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:41Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:41Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:41Z" level=info msg="Scaled down ReplicaSet api-v2-779d9dfb64 (revision 3292) from 110 to 0" event_reason=ScalingReplicaSet namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"cs-team\", Name:\"api-v2\", UID:\"2a76d2a7-e138-4113-b3aa-eb1ec96011cd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1213092265\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down ReplicaSet api-v2-779d9dfb64 (revision 3292) from 110 to 0"

....



time="2022-05-24T07:59:48Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable Show context
--
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"cs-team\", Name:\"api-v2\", UID:\"2a76d2a7-e138-4113-b3aa-eb1ec96011cd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1213103929\", FieldPath:\"\"}): type: 'Normal' reason: 'SwitchService' Switched selector for service 'api-v2-stable' from '779d9dfb64' to '5bb4f8b6b6'"
  |   | time="2022-05-24T08:00:25Z" level=info msg="Switched selector for service 'api-v2-stable' from '779d9dfb64' to '5bb4f8b6b6'" event_reason=SwitchService namespace=cs-team rollout=api-v2

I've tried to brief on these logs, but I can get the full logs if you would like.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@MarkSRobinson MarkSRobinson added the bug Something isn't working label May 25, 2022
@harikrongali
Copy link
Contributor

can you post manifest files that you are using?

@MarkSRobinson
Copy link
Contributor Author

@harikrongali

I've attached the Rollout manifest. I can attach others if they'd be useful. We're using LinkerD as the service mesh topology in the cluster.

rollout.txt

@harikrongali
Copy link
Contributor

@perenesenko can you provide your findings here?

@perenesenko
Copy link
Member

perenesenko commented May 31, 2022

@MarkSRobinson
Could you provide the whole rollout content with the status field:

kubectl get rollout [rolloutname] -o yaml

Could you also provide the rollout info also:

kubectl argo rollouts get rollouts [rolloutname]

@perenesenko
Copy link
Member

Due to logs, I see that canary pods should be reached 100% ready as I see the next log in the first line:

Previous weights: &TrafficWeights{Canary:WeightDestination{Weight:100,ServiceName:api-v2-canary,.....

We're shifting the traffic only in case the canary pods readiness number reached to the desired number. So It should be the 100%
Then we're switching the service labels. But this does not happen because canary pods not 100% ready:

delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available

Have to figure out why on a previous step 100% was ready, but it's not ready on a step with switching the labels.

@MarkSRobinson
Copy link
Contributor Author

@perenesenko

Here's the full rollout object
rollout.txt

What I can figure happened is that we had a networking problem causing 30 pods to fall off being ready (either the node itself died or networking on the node died, we're not certain).

From the logs, it sounds like this could be prevent by either forcing the networking switch knowing pods are missing or cancelling the scale down until the networking is switched successfully.

@jandersen-plaid
Copy link

I think the key issue is that the rollout is continuing reconciliation when the selectors are delayed in being swapped.

That is, https://sourcegraph.com/github.com/argoproj/argo-rollouts/-/blob/rollout/service.go?L284 returns nil as if everything is normal when we cannot swap the replicaset hashes of the stable and canary replicasets. This allows the reconciliation to proceed onward to this step: https://sourcegraph.com/github.com/argoproj/argo-rollouts@25f40d2bb8d6432e54d4eba8f37842af6f0138ad/-/blob/rollout/canary.go?L56:14-56:37 where the canary set (which should be the stable set, but was delayed in being set to stable because of unhealthy replicas) is set to 0 and the stable (which is already 0) is set to 100.

If there was an error within the service "ensureSVCTargets" function then the controller would skip this cycle (as is expected) and not continue to update the weights on the rollout (resulting in no traffic loss).

I have added a pull request that I think will fix this issue, but let me know if I am incorrect here.

@mubarak-j
Copy link
Contributor

mubarak-j commented Sep 13, 2022

One of our apps with over 170 pods suffered an outage due to this bug #2235. Reading this issue, it seems this bug always had the potential to hit since using dynamicStableScale: true (~ 30 releases/rollouts since then). But it seems to be to have made possible by a longer than usual delay for EKS to spin up new nodes/pods during the rollout (~ 5 mins longer) and for that much for new replicaset to be ready.

Sometimes disabling dynamicStableScale as a workaround can be cost-prohibitive specially with long canary releases. I'm curious if this bug hits in the last step, and in my case switching from 50% to 100% traffic in the last step to something with a minimal increment e.g 95% to 100% and therefore waiting on fewer pods to be ready will reduce the chances of encountering this bug?

@harikrongali harikrongali added this to the v1.4 milestone Oct 20, 2022
jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 8, 2022
jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 8, 2022
@zachaller
Copy link
Collaborator

zachaller commented Nov 9, 2022

This is also an interesting #1820

jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 26, 2022
jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 26, 2022
@jandersen-plaid
Copy link

For what it is worth, I think #2187 is ready to go -- I had originally scoped it out to all configurations of argo rollouts, but some of the end to end tests actually rely on swapping service selectors when the canary replicaset is still not ready, so I dialed it back to just dynamicStableScale.

I have applied the patch of the change to the release-1.3 branch (jandersen-plaid#1) if watchers of this issue want to try it out themselves. It is up to the maintainers if they will accept it into the next minor release or a patch release of 1.3.

Please test out this new version before you put it into a production environment: I was not able to construct a consistent test for this because the failure mode relies on obstructing pods from becoming ready at a specific point in time. That being said, I am confident that it is ready to be tested and cautiously rolled out.

Should also help with #1820 and #2235

@zachaller
Copy link
Collaborator

@jandersen-plaid Thanks for updating that I will also take a look at what you have done to "dial" it back a bit. I have been very slowly working at another fix for this with just changing the calculation to account for availability instead of introducing an error state if it pans out I think it would make a bit more sense to go that route. However if it does not pan out I think what you did will end up also making sense. I will try to get the calculation changes figured out here soon just been busy with lots of other things currently. But I should have some time to really dedicate to finishing it.

@jandersen-plaid
Copy link

I have been very slowly working at another fix for this with just changing the calculation to account for availability instead of introducing an error state if it pans out I think it would make a bit more sense to go that route

Great! I took a look at the PR and your approach overall seems more correct to me (adding error states generally leads to difficulty in discerning when the state is updated 😢 ), so I look forward to the final result!

For what it is worth, I had the exact same end to end tests fail for me as well (TestALBExperimentStepNoSetWeight and TestIstioUpdateInMiddleZeroCanaryReplicas). I think that their success actually depends on the replicasets not being ready. There is likely a timing issue between when the tests think the rollout is in a "final" state vs. when the rollout is actually in a final state with an available replicaset. Adjusting those tests to account for different conditions before ExpectRevisionPodCount("3", 1) (for TestIstioUpdateInMiddleZeroCanaryReplicas in https://github.com/argoproj/argo-rollouts/blob/master/test/e2e/istio_test.go#L277) and the Assert (for TestALBExperimentStepNoSetWeight in https://github.com/argoproj/argo-rollouts/blob/master/test/e2e/aws_test.go#L156-L167) should be enough to get all tests passing.

Adding the condition that dynamicStableScale: true effectively skips these tests and ensures that existing tested behavior will be kept (as opposed to adjusting the tests to account for the new behavior). I felt this was generally okay, considering the failure mode with dynamicStableScale is more dire than normal rollouts (double pods available in normal rollouts vs. 0 in the old RS and 100% in the new RS for dynamicStableScale), and the tests that were failing with dynamicStableScale: false were failing because they relied on service selectors being switched before the replicasets were ready.

@zachaller
Copy link
Collaborator

zachaller commented Dec 1, 2022

@jandersen-plaid Here is the PR, @MarkSRobinson Are you able to reproduce this enough that you could help test it if I where to get you a build?

@MarkSRobinson
Copy link
Contributor Author

@zachaller The bug doesn't reliably happen. So I can test this out but it might take a while to get feedback. My concern is that this PR is built on the 1.4 branch and I'm not entirely sure I want to test all the changes in production.

Let me see if I can backport this to 1.3 release branch.

@MarkSRobinson
Copy link
Contributor Author

Ok, fix back-ported - #2449

@zachaller
Copy link
Collaborator

zachaller commented Dec 5, 2022

@MarkSRobinson Do you plan on building a docker image with that patch based on 1.3 or would you like me to? Also note I refactored the PR a bit as well to simplify it

@MarkSRobinson
Copy link
Contributor Author

@zachaller I built it and pushed it to our internal repo. We're testing it out on the testing cluster right now.

@jstewart612
Copy link

@MarkSRobinson status on this? This has now caused TWO production outages for our organization and, as far as we are concerned, is a massive bug that needs immediate fixing.

@raxod502-plaid
Copy link

(Just so you know, @MarkSRobinson, @jandersen-plaid, and myself aren't affiliated with the Argo projects, we're users like yourself. Right there with you, this issue has caused outages for us as well and we're excited to help get it fixed as soon as possible.)

@jstewart612
Copy link

@raxod502-plaid @jandersen-plaid @MarkSRobinson apologies: was doing a lot of avatar clicking to see who was officially on the project and mistook Mark for one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment