Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packed Allocation is very not packed #783

Closed
markmandel opened this issue May 18, 2019 · 9 comments · Fixed by #804
Closed

Packed Allocation is very not packed #783

markmandel opened this issue May 18, 2019 · 9 comments · Fixed by #804
Assignees
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Milestone

Comments

@markmandel
Copy link
Member

What happened:

  • Cluster with 20 nodes for gameservers
  • Create a Fleet with 100 GameServers
  • Created 6 Allocations, and as you can see, they are almost all on different nodes:
root@30dacd3ba219:/go/src/agones.dev/agones# kubectl get gs | grep Alloc | sort
xonotic-27wqm-82mxs   Allocated   34.85.93.181     7607      gke-test-cluster-default-ace71fa2-gpw3   1m
xonotic-27wqm-dr5ql   Allocated   34.85.103.79     7341      gke-test-cluster-default-ace71fa2-1193   2m
xonotic-27wqm-jsbld   Allocated   34.85.115.151    7155      gke-test-cluster-default-ace71fa2-jzql   2m
xonotic-27wqm-v79zp   Allocated   34.85.103.79     7768      gke-test-cluster-default-ace71fa2-1193   1m
xonotic-27wqm-vbcqw   Allocated   35.243.109.91    7555      gke-test-cluster-default-ace71fa2-9f80   1m

What you expected to happen:

Gameservers should have been allocated to the same node (or at least as close as we can get given some performance constraints)

This is far less than ideal, as this will cost our users quite large $$$ at large scale as they will be unable to scale down resources that may be mostly empty. It also defies the "Packed" contract.

How to reproduce it (as minimally and precisely as possible):

Create any of the fleets we have, push it up to a scale such that it will encompass multiple nodes (xonotic is good for this, as it has a larger memory footprint).

Then allocate several gameservers, and kubectl get gs to see the results.

Environment: Linux

  • Agones version: 1.10.0
  • Kubernetes version (use kubectl version): 1.11.x
  • Cloud provider or hardware configuration: GKE
  • Install method (yaml/helm): helm
  • Troubleshooting guide log(s): N/A
  • Others: N/A

⚠️ I am marking this as a 1.0 blocker, because of the large cost it could incur on our users. Please feel free to disagree in comment below ⚠️

@markmandel markmandel added kind/bug These are bugs. area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels May 18, 2019
@markmandel markmandel added this to the 0.11.0 milestone May 18, 2019
@markmandel
Copy link
Member Author

/cc @ilkercelikyilmaz as this will likely impact your performance work

@markmandel
Copy link
Member Author

markmandel commented May 18, 2019

Just thinking about this statistically (Least I think I am - let me know if my math/logic is off here 😄 ).

If I remember/read the logic right - with the current model - which is definitely more aimed at throughput we:

  • sort gameservers by the ones on nodes with the most number of gameservers first
  • then grab the top 100 of that list
  • then randomly select one in that top 100, until one allocates successfully.

Which is awesome for throughput and contention, but doesn't seem to pack very well (unless this changes at scale in a way I'm not seeing? That would be super nice)

But at first pass in my head - this feels statically less likely to pack, then not.

For example:

  1. Say we have 4 nodes, each with 30 reader gameservers on each: [30,30,30,30]
    1 We allocate one, so we end up with [29,30,30,30] (for arguments sake).
  2. The pool to allocate from would become something like [29,30,30,30,1]
  3. which means we are statically more like to allocate from node 1, 2 or 3 - rather than 0 (going by a 0 index), because we end up with more options in nodes that have yet to allocate to them.

Drawing this out, we could end up in scenarios at are something akin to [1,2,4,30,30,30,3] (or similar), and it seems very unlikely we'll actually fill those first nodes. And at scale, this seems like a big waste of resources our users would need to pay for.

The only other thought is that fleet scale down logic (we scale down from the nodes with the least gameservers on them first) will help solve this (at least somewhere) -- need to think about this more.

Not quite sure how to retain the throughput, but also keep things more packed though.

@roberthbailey
Copy link
Member

@markmandel - is this a regression from previous releases or just something that hadn't yet been tested?

@aLekSer
Copy link
Collaborator

aLekSer commented May 20, 2019

@markmandel I was thinking about the source of the problem and think that we should look for packedComparator() function enhancement.
If we got 4 nodes all with 30 gameservers, then we could select node with smallest NodeName all the time, when Ready and Allocated are equal. I have tested this approach in a small Unit test. WDYT?

aLekSer@fd676e2

@aLekSer
Copy link
Collaborator

aLekSer commented May 20, 2019

Solution proposed above does not fix the problem, fleet is still not packed. Need to add logging to debug what is the root cause.
I was able to reproduce the problem on test-cluster with examples/fleet.yaml and

for i in {0..10} ; do kubectl create -f ./examples/gameserverallocation.yaml ; done

where in gameserverallocation.yaml spec:

 required:
    matchLabels:
      foo: bar

@markmandel
Copy link
Member Author

markmandel commented May 20, 2019

@roberthbailey good question. It is a regression - but the sacrifice was made to allow better throughput. To be honest, I didn't realise it was quite this sub-optimal until I started poking at it now. (Lesson: write more tests 😄 )

For comparison, if we do the same things with FleetAllocation - which use the same logic (but have much, much lower throughput), we see this:

root@30dacd3ba219:/go/src/agones.dev/agones# kubectl get gs | grep Alloc
xonotic-przc8-2npn8   Allocated   35.236.55.223    7505      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-7zcpb   Allocated   35.236.55.223    7740      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-9n94z   Allocated   35.236.55.223    7131      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-g5gbv   Allocated   35.236.55.223    7681      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-mwkcf   Allocated   35.236.55.223    7002      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-x9sfx   Allocated   35.236.55.223    7812      gke-test-cluster-default-f11755a7-pgzl   4h
xonotic-przc8-xqfrw   Allocated   35.236.55.223    7049      gke-test-cluster-default-f11755a7-pgzl   4h

But this is throttled by a mutex among other things.

The current idea i have floating around my head is to presort the gameservers, and essentially have a queue that we can pop values off on each allocation. Even if the sorting is only eventually consistent, we'll avoid contention issues, and get a better packed result.

Something like this might work: https://github.com/wangjia184/sortedset
You could pop the min for packed, and pop the max for distributed.

The only trickiness I'm seeing is keeping the gameservers sorted regularly without slowing down the allocation process. Maybe a full resync every 30 seconds or something happening in a different goroutine? (The sortedset sorts by score, and then lexigraphically, so it should in theory work pretty well).

I might give the implementation a shot, and see how it pans out. Be curious to hear thoughts?

@markmandel
Copy link
Member Author

Taking a stab at this here: https://github.com/markmandel/agones/tree/perf/sorted-set - curious to see how the approach performs at scale once complete.

@markmandel
Copy link
Member Author

Arg, I forgot that we have the required/preferred selectors, doh!. My approach needs to be tweaked / may not work 😞

@markmandel
Copy link
Member Author

Revised plan - not 100% sure it's optimal for throughput + packing, but I think it's worth a shot - but also happy to hear dissenting opinions if you can see holes in the logic.

This is based on the assumption that the selectors for a GameServerAllocation are likely to not change very often, and generally work on large fleets - rather than very specific small groups (although this might work okay for small groups at lower throughput?)

  1. We keep a cache of sorted preferred + required gameservers for each GameServerAllocation Selector + namespace combo (key)
    1. Sorting is the number of allocated GameServers in the node they are scheduled on.
  2. When a request comes in for a new key we build it if it doesn't exist
  3. On allocation we grab the cache for that key and atomically pop off a gameserver to attempt to allocate
  4. if it fails due to conflict, then just pop off another value and try again (don't put it back, it's likely already been allocated/deleted/something)
  5. If a GameServer is deleted, remove it from all caches.
  6. If a GameServer goes from Ready => Anything else, remove it from all caches
  7. Every 30 seconds, update the sorting of all GameServers

Two potential issues and/or solutions

  1. GameServerAllocations that overlap on GameServers they are selecting from.
  2. When we split allocation into multiple pods, each cache is not as closely in sync.

Both of these could drive contention as they would be attempting to pop the same GameServers on each run.

Potential solution: add jitter from 0-1 to each score, so that within a node, the order within a node (or those with the same allocated counts) is randomised per cache. This could potentially be increased to more than 0-1 if we need more randomness.

@markmandel markmandel self-assigned this May 28, 2019
markmandel added a commit to markmandel/agones that referenced this issue Jun 4, 2019
This implements a batching algorithm for Packed and Distributed
GameServerAllocations (GSA).

This ensures that we can still get high throughout on GSA's, while
retaining a tight packing of Allocated GameServers on each node with
the Packed strategy.

The Distributed strategy is now implemented with a randomised Allocation
process, so ensure distributed load under this strategy.

Closes googleforgames#783
markmandel added a commit to markmandel/agones that referenced this issue Jun 4, 2019
This implements a batching algorithm for Packed and Distributed
GameServerAllocations (GSA).

This ensures that we can still get high throughout on GSA's, while
retaining a tight packing of Allocated GameServers on each node with
the Packed strategy.

The Distributed strategy is now implemented with a randomised Allocation
process, so ensure distributed load under this strategy.

Closes googleforgames#783
markmandel added a commit to markmandel/agones that referenced this issue Jun 4, 2019
This implements a batching algorithm for Packed and Distributed
GameServerAllocations (GSA).

This ensures that we can still get high throughout on GSA's, while
retaining a tight packing of Allocated GameServers on each node with
the Packed strategy.

The Distributed strategy is now implemented with a randomised Allocation
process, so ensure distributed load under this strategy.

Closes googleforgames#783
markmandel added a commit that referenced this issue Jun 12, 2019
* Batched Packed and Distributed Allocations

This implements a batching algorithm for Packed and Distributed
GameServerAllocations (GSA).

This ensures that we can still get high throughout on GSA's, while
retaining a tight packing of Allocated GameServers on each node with
the Packed strategy.

The Distributed strategy is now implemented with a randomised Allocation
process, so ensure distributed load under this strategy.

Closes #783

* Move updateQueue into allocationUpdateWorkers.

* Add large explanatory code block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Projects
None yet
3 participants