Support Multiple Cluster spanning the globe #301

markmandel · 2018-07-20T18:56:31Z

Agones should make supporting multiple Agones clusters around the world, and provide tools for streamlining these processes.

I see this come under two main feature sets:

Cluster Registries

Be able to add/remove clusters from a single registry
Be aware of the health of a registry
That registry should be queryable in some way as well.

There should be existing work, most likely coming out of sig-multicluster, that we can take advantage of - as many applications have this need.

Research

Standard Ping/Latency tooling

We should integrate some tooling into each cluster and accompanying tooling that can be incorporated into game clients, such that determining ping time to clusters is an out of the box solution (or as out of the box as we can make it)

victor-prodan · 2018-07-21T06:15:46Z

I think it would be useful to be able to define spillover rules - when one cluster is full to redirect allocation requests to another one.

markmandel · 2018-07-27T17:48:45Z

Ooh, that's an interesting idea. Kind of like a global fleet allocation load balancer. Nice 👍

markmandel · 2018-09-06T23:19:25Z

We will also need to consider how to deploy/manage fleets to multiple clusters around the globe, and how to manage that. It isn't just a "deploy to all" - but gradual rollout / specific region first, etc.

Also, should multicluster have some kind of "cost" / "priority" value? i.e. if you run your own datacenter, you only want to burst into the cloud when necessary. How can you show that? (Can clusters have labels and annotations? That could well be an easy answer)

maxpain · 2018-10-07T07:13:05Z

+1. I'm looking forward to it

cyriltovena · 2018-10-11T22:38:56Z

@markmandel Question: is it out of scope cluster creation ? I personally think so, also this makes it easier to handle multiple clusters by using a single mean of connection the kubeconfig/service account. We should however add a tool that help to register kubeconfig for each provider.

markmandel · 2018-10-11T22:40:17Z

@Kuqd yeah, I would agree that cluster creation is out of scope - but yes, we need some kind of tool for managing kubeconfig -- I wonder if cluster-registry will help?

cyriltovena · 2018-10-11T23:02:55Z

no it's something else basically endpoints and authinfo (bearer token from sa).

However looking at the code base there is nothing else than the CRD and generated code for it, so this is very early. At the end of the day you need a rest.Config

cyriltovena · 2018-10-30T13:26:55Z

https://kubernetes.io/docs/concepts/cluster-administration/federation/ this seems more mature.

markmandel · 2018-10-30T14:55:41Z

@Kuqd although nobody uses Federation anymore - it's pretty much been determined to be a failed experiment.

Although maybe there are pieces we can lift out of it that are useful.

Oleksii-Terekhov · 2018-11-14T16:43:49Z

We also interested in multi-cluster Agones.
But selecting best cluster - in our project it's Matchmaking goal - due minimal ping for all users, costs and similar reasons for maximize UX and profit.
So any "logic to select cluster for Gameserver" must be switchable - we want explicit select fleet+cluster in FleetAllocation manifest...
Maybe, as side-effect, must be quick and trusted metrics API in Agones controller about current load (free/allocated/total/unhealthy/available GameServers....)

markmandel · 2018-11-14T16:49:08Z

I've started researching this as the next big ticket item to tackle.

The thing I'm trying to decide is what is the first item that should be tackled.

Maybe having a registry of clusters, and a standard way to ping each of them for round trip time? Would that be a good starting point?

EricFortin · 2018-11-15T11:58:52Z

I am not convinced about the registry. Unless we want to offer that as a separate project, where would this leave? As stated before allocation strategy is bound to change a lot based on multiple criteria. So if we only delivers a simple registry, I feel most already have some form of service to return configuration where they could store that list.

That being said, having a setup to ping each cluster so we can send this data to our matchmaking service is definitely worth it. It also requires client code so we need to think about how we will deliver this since this code will need to run on consoles too.

markmandel · 2018-11-15T17:24:50Z

@EricFortin agreed. Doing the research, it looks like the cluster-registry project and also federation-v2 (linked above) will (at least eventually) take care of registering and tracking multiple clusters, as well as being able to deploy Agones CRDs, such as fleets across them - so while that is still alpha now, it's coming, and we should lean on that work (I'll likely start playing with both soon, just to get a feel).

Regarding a "pinger" - I have a couple of questions in regard to this (probably because my knowledge here is a thinner):

Is there any prior art we can leverage here? Is there a standard way of determining RTT, or an existing open source project we can leverage? (I had a hunt, but couldn't find anything - do we send some UDP packets and echo them back - maybe with an epoch timestamp?)
Client side code - I'm thinking that we both need to do a C# and a C++ sdk for this, but also define it as a standard. That way, if can't use the supported client code, it can be integrated relatively easily (I hope) / can be developed in phases.

Does that make sense?

^ this should likely also be its own ticket at this point, it seems.

victor-prodan · 2018-11-16T11:23:17Z

Client side code - I'm thinking that we both need to do a C# and a C++ sdk for this

Do you mean game client code? If yes, I don't think it's necessary. I think that each game has its own way of pinging a HTTP (or maybe UDP) endpoint, so we just need a way to create those endpoints.

Here is how Amazon is doing it: https://www.cloudping.info/.
I imagine a similar thing for Agones.

markmandel · 2018-11-16T21:21:05Z

Also similar to: http://www.gcping.com/ 😄

Is a HTTP endpoint good enough, or does it have to be a UDP endpoint? (or maybe both?)

Here's an interesting question - should this be behind a loadbalancer? A LB will mean that we can do redundancy, and scale up the pinger (if it's needed), but does a LB introduce a layer that otherwise wouldn't be there.

victor-prodan · 2018-11-17T20:01:55Z

A LB is needed, yes, and this is why it sounds like an independent service hosted in the same cluster. Something like this already exists, maybe?

About udp... Yes, it might be helpful to detect packet loss for example.

IT also depends on the protocol used by the game... If it's based on smth like websocket than they wouldnt need udp.

Http is easier to implement by both sides and it's a must. Udp is nice to have.

markmandel · 2018-11-17T20:14:42Z

That makes lots of sense.

For the HTTP request - I'm assuming this would need to return an "ok" and HTTP 200, and the client would want to track the round trip time themselves.

For UDP - would it return a empty packet back to the sender? Does it need to contain any information, like an id/hash? (or echo what has been sent) Since it's async, I would assume this would be required - unless there is a better way.

Another fun question - do we have any concerns about someone using this for a reflection DDOS attack of some kind? (spoofing the "from" address, to forward the attack to another location? Sounds like we should rate limit these requests).

victor-prodan · 2018-11-19T15:17:52Z

For the HTTP request - I'm assuming this would need to return an "ok" and HTTP 200, and the client would want to track the round trip time themselves.

👍

For UDP - would it return a empty packet back to the sender? Does it need to contain any information, like an id/hash?

UDP is tricky, as the production would use its own protocol. Maybe it would be better to let the user supply the binary? They would also be responsible for any throttling in this case.

EricFortin · 2018-11-19T19:38:31Z

@markmandel

Another fun question - do we have any concerns about someone using this for a reflection DDOS attack of some kind? (spoofing the "from" address, to forward the attack to another location? Sounds like we should rate limit these requests).

For a reflection attack to be effective, you usually want to provoke a bigger response than the request you sent. If we simply echo back the packet we received, there is not much to gain from hitting us instead of the target directly.

Rate limiting is still a good thing though.

Context: googleforgames#301 This creates a simple HTTP endpoint and/or a rate limited UDP echo service to be able to easily do RTT latency tests from game clients, to multiple Agones installs.

Context: #301 This creates a simple HTTP endpoint and/or a rate limited UDP echo service to be able to easily do RTT latency tests from game clients, to multiple Agones installs.

markmandel · 2021-04-22T16:33:32Z

I think we can also close this one, since we have https://agones.dev/site/docs/advanced/multi-cluster-allocation/ @pooneh-m WDYT?

pooneh-m · 2021-04-22T16:35:37Z

Makes sense. I closed it.

markmandel added kind/feature New features for Agones kind/design Proposal discussing new features / fixes and how they should be implemented area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels Jul 20, 2018

markmandel self-assigned this Nov 14, 2018

markmandel mentioned this issue Dec 1, 2018

Pinger service for Multiple Cluster Latency Measurement. #434

Merged

markmandel removed their assignment Jan 31, 2019

markmandel added the stale Pending closure unless there is a strong objection. label Apr 22, 2021

pooneh-m closed this as completed Apr 22, 2021

roberthbailey added the wontfix Sorry, but we're not going to do that. label Nov 16, 2021

WontonSam mentioned this issue Nov 29, 2023

[Snyk] Fix for 13 vulnerabilities WontonSam/Phiahplay-#7

Open

WontonSam mentioned this issue Jan 2, 2024

[Snyk] Security upgrade bundlewatch from 0.3.1 to 0.3.2 WontonSam/Phiahplay-#13

Open

WontonSam mentioned this issue Mar 15, 2024

[Snyk] Security upgrade bundlewatch from 0.3.1 to 0.3.2 WontonSam/Phiahplay-#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Multiple Cluster spanning the globe #301

Support Multiple Cluster spanning the globe #301

markmandel commented Jul 20, 2018 •

edited

Loading

victor-prodan commented Jul 21, 2018

markmandel commented Jul 27, 2018

markmandel commented Sep 6, 2018 •

edited

Loading

maxpain commented Oct 7, 2018 •

edited

Loading

cyriltovena commented Oct 11, 2018

markmandel commented Oct 11, 2018

cyriltovena commented Oct 11, 2018

cyriltovena commented Oct 30, 2018

markmandel commented Oct 30, 2018

Oleksii-Terekhov commented Nov 14, 2018

markmandel commented Nov 14, 2018

EricFortin commented Nov 15, 2018

markmandel commented Nov 15, 2018

victor-prodan commented Nov 16, 2018

markmandel commented Nov 16, 2018

victor-prodan commented Nov 17, 2018

markmandel commented Nov 17, 2018

victor-prodan commented Nov 19, 2018

EricFortin commented Nov 19, 2018

markmandel commented Apr 22, 2021

pooneh-m commented Apr 22, 2021

Support Multiple Cluster spanning the globe #301

Support Multiple Cluster spanning the globe #301

Comments

markmandel commented Jul 20, 2018 • edited Loading

Cluster Registries

Research

Standard Ping/Latency tooling

victor-prodan commented Jul 21, 2018

markmandel commented Jul 27, 2018

markmandel commented Sep 6, 2018 • edited Loading

maxpain commented Oct 7, 2018 • edited Loading

cyriltovena commented Oct 11, 2018

markmandel commented Oct 11, 2018

cyriltovena commented Oct 11, 2018

cyriltovena commented Oct 30, 2018

markmandel commented Oct 30, 2018

Oleksii-Terekhov commented Nov 14, 2018

markmandel commented Nov 14, 2018

EricFortin commented Nov 15, 2018

markmandel commented Nov 15, 2018

victor-prodan commented Nov 16, 2018

markmandel commented Nov 16, 2018

victor-prodan commented Nov 17, 2018

markmandel commented Nov 17, 2018

victor-prodan commented Nov 19, 2018

EricFortin commented Nov 19, 2018

markmandel commented Apr 22, 2021

pooneh-m commented Apr 22, 2021

markmandel commented Jul 20, 2018 •

edited

Loading

markmandel commented Sep 6, 2018 •

edited

Loading

maxpain commented Oct 7, 2018 •

edited

Loading