Packed Allocation is very not packed #783

markmandel · 2019-05-18T09:22:20Z

What happened:

Cluster with 20 nodes for gameservers
Create a Fleet with 100 GameServers
Created 6 Allocations, and as you can see, they are almost all on different nodes:

root@30dacd3ba219:/go/src/agones.dev/agones# kubectl get gs | grep Alloc | sort
xonotic-27wqm-82mxs   Allocated   34.85.93.181     7607      gke-test-cluster-default-ace71fa2-gpw3   1m
xonotic-27wqm-dr5ql   Allocated   34.85.103.79     7341      gke-test-cluster-default-ace71fa2-1193   2m
xonotic-27wqm-jsbld   Allocated   34.85.115.151    7155      gke-test-cluster-default-ace71fa2-jzql   2m
xonotic-27wqm-v79zp   Allocated   34.85.103.79     7768      gke-test-cluster-default-ace71fa2-1193   1m
xonotic-27wqm-vbcqw   Allocated   35.243.109.91    7555      gke-test-cluster-default-ace71fa2-9f80   1m

What you expected to happen:

Gameservers should have been allocated to the same node (or at least as close as we can get given some performance constraints)

This is far less than ideal, as this will cost our users quite large $$$ at large scale as they will be unable to scale down resources that may be mostly empty. It also defies the "Packed" contract.

How to reproduce it (as minimally and precisely as possible):

Create any of the fleets we have, push it up to a scale such that it will encompass multiple nodes (xonotic is good for this, as it has a larger memory footprint).

Then allocate several gameservers, and kubectl get gs to see the results.

Environment: Linux

Agones version: 1.10.0
Kubernetes version (use kubectl version): 1.11.x
Cloud provider or hardware configuration: GKE
Install method (yaml/helm): helm
Troubleshooting guide log(s): N/A
Others: N/A

⚠️ I am marking this as a 1.0 blocker, because of the large cost it could incur on our users. Please feel free to disagree in comment below ⚠️

The text was updated successfully, but these errors were encountered:

markmandel · 2019-05-18T09:22:40Z

/cc @ilkercelikyilmaz as this will likely impact your performance work

markmandel · 2019-05-18T10:09:28Z

Just thinking about this statistically (Least I think I am - let me know if my math/logic is off here 😄 ).

If I remember/read the logic right - with the current model - which is definitely more aimed at throughput we:

sort gameservers by the ones on nodes with the most number of gameservers first
then grab the top 100 of that list
then randomly select one in that top 100, until one allocates successfully.

Which is awesome for throughput and contention, but doesn't seem to pack very well (unless this changes at scale in a way I'm not seeing? That would be super nice)

But at first pass in my head - this feels statically less likely to pack, then not.

For example:

Say we have 4 nodes, each with 30 reader gameservers on each: [30,30,30,30]
1 We allocate one, so we end up with [29,30,30,30] (for arguments sake).
The pool to allocate from would become something like [29,30,30,30,1]
which means we are statically more like to allocate from node 1, 2 or 3 - rather than 0 (going by a 0 index), because we end up with more options in nodes that have yet to allocate to them.

Drawing this out, we could end up in scenarios at are something akin to [1,2,4,30,30,30,3] (or similar), and it seems very unlikely we'll actually fill those first nodes. And at scale, this seems like a big waste of resources our users would need to pay for.

The only other thought is that fleet scale down logic (we scale down from the nodes with the least gameservers on them first) will help solve this (at least somewhere) -- need to think about this more.

Not quite sure how to retain the throughput, but also keep things more packed though.

roberthbailey · 2019-05-19T09:00:48Z

@markmandel - is this a regression from previous releases or just something that hadn't yet been tested?

aLekSer · 2019-05-20T13:24:54Z

@markmandel I was thinking about the source of the problem and think that we should look for packedComparator() function enhancement.
If we got 4 nodes all with 30 gameservers, then we could select node with smallest NodeName all the time, when Ready and Allocated are equal. I have tested this approach in a small Unit test. WDYT?

aLekSer@fd676e2

aLekSer · 2019-05-20T14:30:22Z

Solution proposed above does not fix the problem, fleet is still not packed. Need to add logging to debug what is the root cause.
I was able to reproduce the problem on test-cluster with examples/fleet.yaml and

for i in {0..10} ; do kubectl create -f ./examples/gameserverallocation.yaml ; done

where in gameserverallocation.yaml spec:

 required:
    matchLabels:
      foo: bar

markmandel · 2019-05-20T23:04:11Z

@roberthbailey good question. It is a regression - but the sacrifice was made to allow better throughput. To be honest, I didn't realise it was quite this sub-optimal until I started poking at it now. (Lesson: write more tests 😄 )

For comparison, if we do the same things with FleetAllocation - which use the same logic (but have much, much lower throughput), we see this:

root@30dacd3ba219:/go/src/agones.dev/agones# kubectl get gs | grep Alloc
xonotic-przc8-2npn8   Allocated   35.236.55.223    7505      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-7zcpb   Allocated   35.236.55.223    7740      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-9n94z   Allocated   35.236.55.223    7131      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-g5gbv   Allocated   35.236.55.223    7681      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-mwkcf   Allocated   35.236.55.223    7002      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-x9sfx   Allocated   35.236.55.223    7812      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-xqfrw   Allocated   35.236.55.223    7049      gke-test-cluster-default-f11755a7-pgzl   4h

But this is throttled by a mutex among other things.

The current idea i have floating around my head is to presort the gameservers, and essentially have a queue that we can pop values off on each allocation. Even if the sorting is only eventually consistent, we'll avoid contention issues, and get a better packed result.

Something like this might work: https://github.com/wangjia184/sortedset
You could pop the min for packed, and pop the max for distributed.

The only trickiness I'm seeing is keeping the gameservers sorted regularly without slowing down the allocation process. Maybe a full resync every 30 seconds or something happening in a different goroutine? (The sortedset sorts by score, and then lexigraphically, so it should in theory work pretty well).

I might give the implementation a shot, and see how it pans out. Be curious to hear thoughts?

markmandel · 2019-05-21T01:30:13Z

Taking a stab at this here: https://github.com/markmandel/agones/tree/perf/sorted-set - curious to see how the approach performs at scale once complete.

markmandel · 2019-05-21T03:06:48Z

Arg, I forgot that we have the required/preferred selectors, doh!. My approach needs to be tweaked / may not work 😞

markmandel · 2019-05-21T17:30:27Z

Revised plan - not 100% sure it's optimal for throughput + packing, but I think it's worth a shot - but also happy to hear dissenting opinions if you can see holes in the logic.

This is based on the assumption that the selectors for a GameServerAllocation are likely to not change very often, and generally work on large fleets - rather than very specific small groups (although this might work okay for small groups at lower throughput?)

We keep a cache of sorted preferred + required gameservers for each GameServerAllocation Selector + namespace combo (key)
1. Sorting is the number of allocated GameServers in the node they are scheduled on.
When a request comes in for a new key we build it if it doesn't exist
On allocation we grab the cache for that key and atomically pop off a gameserver to attempt to allocate
if it fails due to conflict, then just pop off another value and try again (don't put it back, it's likely already been allocated/deleted/something)
If a GameServer is deleted, remove it from all caches.
If a GameServer goes from Ready => Anything else, remove it from all caches
Every 30 seconds, update the sorting of all GameServers

Two potential issues and/or solutions

GameServerAllocations that overlap on GameServers they are selecting from.
When we split allocation into multiple pods, each cache is not as closely in sync.

Both of these could drive contention as they would be attempting to pop the same GameServers on each run.

Potential solution: add jitter from 0-1 to each score, so that within a node, the order within a node (or those with the same allocated counts) is randomised per cache. This could potentially be increased to more than 0-1 if we need more randomness.

This implements a batching algorithm for Packed and Distributed GameServerAllocations (GSA). This ensures that we can still get high throughout on GSA's, while retaining a tight packing of Allocated GameServers on each node with the Packed strategy. The Distributed strategy is now implemented with a randomised Allocation process, so ensure distributed load under this strategy. Closes googleforgames#783

* Batched Packed and Distributed Allocations This implements a batching algorithm for Packed and Distributed GameServerAllocations (GSA). This ensures that we can still get high throughout on GSA's, while retaining a tight packing of Allocated GameServers on each node with the Packed strategy. The Distributed strategy is now implemented with a randomised Allocation process, so ensure distributed load under this strategy. Closes #783 * Move updateQueue into allocationUpdateWorkers. * Add large explanatory code block.

markmandel added kind/bug These are bugs. area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels May 18, 2019

markmandel added this to the 0.11.0 milestone May 18, 2019

aLekSer mentioned this issue May 20, 2019

Packed Strategy Gameserverallocation fix #784

Closed

ilkercelikyilmaz mentioned this issue May 28, 2019

Request to become an Approver on Agones #796

Closed

markmandel self-assigned this May 28, 2019

markmandel mentioned this issue Jun 4, 2019

Batched Packed and Distributed Allocations #804

Merged

markmandel closed this as completed in #804 Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packed Allocation is very not packed #783

Packed Allocation is very not packed #783

markmandel commented May 18, 2019

markmandel commented May 18, 2019

markmandel commented May 18, 2019 •

edited

Loading

roberthbailey commented May 19, 2019

aLekSer commented May 20, 2019 •

edited

Loading

aLekSer commented May 20, 2019 •

edited

Loading

markmandel commented May 20, 2019 •

edited

Loading

markmandel commented May 21, 2019

markmandel commented May 21, 2019

markmandel commented May 21, 2019

Packed Allocation is very not packed #783

Packed Allocation is very not packed #783

Comments

markmandel commented May 18, 2019

markmandel commented May 18, 2019

markmandel commented May 18, 2019 • edited Loading

roberthbailey commented May 19, 2019

aLekSer commented May 20, 2019 • edited Loading

aLekSer commented May 20, 2019 • edited Loading

markmandel commented May 20, 2019 • edited Loading

markmandel commented May 21, 2019

markmandel commented May 21, 2019

markmandel commented May 21, 2019

markmandel commented May 18, 2019 •

edited

Loading

aLekSer commented May 20, 2019 •

edited

Loading

aLekSer commented May 20, 2019 •

edited

Loading

markmandel commented May 20, 2019 •

edited

Loading