-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to scale fleet after an update when there are allocated game servers from both old and new version #3287
Comments
Ooh tricky! Thanks for the details replication steps! |
I've replicated the issue in an e2e test 👍🏻 |
Got a fix that seems to be working, now I just need to tidy it up and make sure I didn't break anything else: |
Fixes bug wherein if a set of Allocations occurred across two or more GameServerSets that had yet to be deleted for a RollingUpdate (because of Allocated GameServers), and a scale down operation moved the Fleet replica count to below the current number of Allocated GameServers -- scaling back up would not move above the current number of Allocated GameServers. Or to put it another way, the current Fleet update logic didn't consider old GameServerSets with Allocated GameServers but a 0 value for `Spec.Replicas` as a complete rollout when scaling back up, so the logic went back into rolling update logic, and it all went sideways. This short circuits that scenario up front. Close googleforgames#3287
Fixes bug wherein if a set of Allocations occurred across two or more GameServerSets that had yet to be deleted for a RollingUpdate (because of Allocated GameServers), and a scale down operation moved the Fleet replica count to below the current number of Allocated GameServers -- scaling back up would not move above the current number of Allocated GameServers. Or to put it another way, the current Fleet update logic didn't consider old GameServerSets with Allocated GameServers but a 0 value for `Spec.Replicas` as a complete rollout when scaling back up, so the logic went back into rolling update logic, and it all went sideways. This short circuits that scenario up front. Close googleforgames#3287
Fixes bug wherein if a set of Allocations occurred across two or more GameServerSets that had yet to be deleted for a RollingUpdate (because of Allocated GameServers), and a scale down operation moved the Fleet replica count to below the current number of Allocated GameServers -- scaling back up would not move above the current number of Allocated GameServers. Or to put it another way, the current Fleet update logic didn't consider old GameServerSets with Allocated GameServers but a 0 value for `Spec.Replicas` as a complete rollout when scaling back up, so the logic went back into rolling update logic, and it all went sideways. This short circuits that scenario up front. Close #3287 Co-authored-by: Mengye (Max) Gong <[email protected]>
What happened: The fleet get stuck, no new game servers created.
What you expected to happen: The fleet scales up successfully.
How to reproduce it (as minimally and precisely as possible):
The fleet will get stuck in this state:
The gss will look like this:
Notice the state of the active gss that the
.Spec.Replicas
(DESIRED) become lower than both the.Status.Replicas
(CURRENT) and.Status.AllocatedReplicas
(ALLOCATED) columns, and stay like that forever.The fact that
.Spec.Replicas
<.Status.Replicas
caused the fleet rolling update logic to always skip the active gss, see pkg/fleets/controller.go#L459. This also means.Spec.Replicas
won't change.The
.Status.Replicas
won't change either, since it cannot be lower than.Status.AllocatedReplicas
. Therefore, the active gss cannot get out of the stuck state on their own unless those allocated game servers quit.The code that cause
.Spec.Replicas
to become lower than.Status.AllocatedReplicas
is probably this particular line: pkg/fleets/controller.go#L471.Anything else we need to know?:
Possibly related to #2617 and #2574. The fix there doesn't solve this corner case.
Environment:
kubectl version
): 1.23The text was updated successfully, but these errors were encountered: