You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A scaling event which occurs during a post-promotion analysis template run causes only the active replicaset to scaled and not the stable replicaset. This in-turn causes the rollout to be in an irreconcilable state as it waits for the stable replicaset to have the same number of replicas as the rollout requires.
To Reproduce
Deploy a new version of a bluegreen rollout, with a post promotion analysis template
Wait for the rollout to automatically promote the new replicaset and run it's post promotion analysis template
While it runs the analysis template the new replica set is considered active but not stable
Run a scaling event during this period (while the new replicaset is considered active but not stable)
This causes the active (new replicaset) to be scaled to whatever replica count you request but the stable replicaset is untouched
Wait for the analysis template to finish
Observe that while the (old replicaset) stable is not equal to the replica count desired by the rollout the rollout is stuck waiting to for minimum availability.
Expected behavior
During the period of post promotion analysis, a scaling event should affect both the stable and the active replicaset replicas. This should ensure that in the case there is a genuine traffic spike and error concurrently we rollback to a stable replicaset with enough replicas and also to ensure that we don't get stuck waiting for a rollout that is irreconcilable.
Screenshots
Version
v1.7.0
Logs
infinite loop of:
time="2024-06-21T10:59:40Z" level=info msg="Syncing replicas only due to scaling event" namespace=staging rollout=my-app
time="2024-06-21T10:59:40Z" level=info msg="Reconciling stable ReplicaSet 'my-app-6b6649b84d'" namespace=staging rollout=my-app
time="2024-06-21T10:59:40Z" level=info msg="No status changes. Skipping patch" generation=1324 namespace=staging resourceVersion=159634477 rollout=my-app
time="2024-06-21T10:59:40Z" level=info msg="Queueing up Rollout for a progress check now" namespace=staging rollout=my-app
time="2024-06-21T10:59:40Z" level=info msg="Reconciliation completed" generation=1324 namespace=staging resourceVersion=159634477 rollout=my-app time_ms=29.734882
time="2024-06-21T10:59:40Z" level=info msg="Started syncing rollout" generation=1324 namespace=staging resourceVersion=159634477 rollout=my-app
time="2024-06-21T10:59:40Z" level=info msg="Syncing replicas only due to scaling event" namespace=staging rollout=my-app
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered:
Checklist:
Describe the bug
A scaling event which occurs during a post-promotion analysis template run causes only the active replicaset to scaled and not the stable replicaset. This in-turn causes the rollout to be in an irreconcilable state as it waits for the stable replicaset to have the same number of replicas as the rollout requires.
To Reproduce
Expected behavior
During the period of post promotion analysis, a scaling event should affect both the stable and the active replicaset replicas. This should ensure that in the case there is a genuine traffic spike and error concurrently we rollback to a stable replicaset with enough replicas and also to ensure that we don't get stuck waiting for a rollout that is irreconcilable.
Screenshots
Version
v1.7.0
Logs
infinite loop of:
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: