Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multiple Cluster spanning the globe #301

Closed
markmandel opened this issue Jul 20, 2018 · 21 comments
Closed

Support Multiple Cluster spanning the globe #301

markmandel opened this issue Jul 20, 2018 · 21 comments
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones stale Pending closure unless there is a strong objection. wontfix Sorry, but we're not going to do that.

Comments

@markmandel
Copy link
Member

markmandel commented Jul 20, 2018

Agones should make supporting multiple Agones clusters around the world, and provide tools for streamlining these processes.

I see this come under two main feature sets:

Cluster Registries

  • Be able to add/remove clusters from a single registry
  • Be aware of the health of a registry
  • That registry should be queryable in some way as well.

There should be existing work, most likely coming out of sig-multicluster, that we can take advantage of - as many applications have this need.

Research

Standard Ping/Latency tooling

We should integrate some tooling into each cluster and accompanying tooling that can be incorporated into game clients, such that determining ping time to clusters is an out of the box solution (or as out of the box as we can make it)

@markmandel markmandel added kind/feature New features for Agones kind/design Proposal discussing new features / fixes and how they should be implemented area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels Jul 20, 2018
@victor-prodan
Copy link
Contributor

I think it would be useful to be able to define spillover rules - when one cluster is full to redirect allocation requests to another one.

@markmandel
Copy link
Member Author

Ooh, that's an interesting idea. Kind of like a global fleet allocation load balancer. Nice 👍

@markmandel
Copy link
Member Author

markmandel commented Sep 6, 2018

We will also need to consider how to deploy/manage fleets to multiple clusters around the globe, and how to manage that. It isn't just a "deploy to all" - but gradual rollout / specific region first, etc.

Also, should multicluster have some kind of "cost" / "priority" value? i.e. if you run your own datacenter, you only want to burst into the cloud when necessary. How can you show that? (Can clusters have labels and annotations? That could well be an easy answer)

@maxpain
Copy link
Contributor

maxpain commented Oct 7, 2018

+1. I'm looking forward to it

@cyriltovena
Copy link
Collaborator

@markmandel Question: is it out of scope cluster creation ? I personally think so, also this makes it easier to handle multiple clusters by using a single mean of connection the kubeconfig/service account. We should however add a tool that help to register kubeconfig for each provider.

@markmandel
Copy link
Member Author

@Kuqd yeah, I would agree that cluster creation is out of scope - but yes, we need some kind of tool for managing kubeconfig -- I wonder if cluster-registry will help?

@cyriltovena
Copy link
Collaborator

no it's something else basically endpoints and authinfo (bearer token from sa).

However looking at the code base there is nothing else than the CRD and generated code for it, so this is very early. At the end of the day you need a rest.Config

@cyriltovena
Copy link
Collaborator

@markmandel
Copy link
Member Author

@Kuqd although nobody uses Federation anymore - it's pretty much been determined to be a failed experiment.

Although maybe there are pieces we can lift out of it that are useful.

@Oleksii-Terekhov
Copy link

We also interested in multi-cluster Agones.
But selecting best cluster - in our project it's Matchmaking goal - due minimal ping for all users, costs and similar reasons for maximize UX and profit.
So any "logic to select cluster for Gameserver" must be switchable - we want explicit select fleet+cluster in FleetAllocation manifest...
Maybe, as side-effect, must be quick and trusted metrics API in Agones controller about current load (free/allocated/total/unhealthy/available GameServers....)

@markmandel markmandel self-assigned this Nov 14, 2018
@markmandel
Copy link
Member Author

I've started researching this as the next big ticket item to tackle.

The thing I'm trying to decide is what is the first item that should be tackled.

Maybe having a registry of clusters, and a standard way to ping each of them for round trip time? Would that be a good starting point?

@EricFortin
Copy link
Collaborator

I am not convinced about the registry. Unless we want to offer that as a separate project, where would this leave? As stated before allocation strategy is bound to change a lot based on multiple criteria. So if we only delivers a simple registry, I feel most already have some form of service to return configuration where they could store that list.

That being said, having a setup to ping each cluster so we can send this data to our matchmaking service is definitely worth it. It also requires client code so we need to think about how we will deliver this since this code will need to run on consoles too.

@markmandel
Copy link
Member Author

@EricFortin agreed. Doing the research, it looks like the cluster-registry project and also federation-v2 (linked above) will (at least eventually) take care of registering and tracking multiple clusters, as well as being able to deploy Agones CRDs, such as fleets across them - so while that is still alpha now, it's coming, and we should lean on that work (I'll likely start playing with both soon, just to get a feel).

Regarding a "pinger" - I have a couple of questions in regard to this (probably because my knowledge here is a thinner):

  1. Is there any prior art we can leverage here? Is there a standard way of determining RTT, or an existing open source project we can leverage? (I had a hunt, but couldn't find anything - do we send some UDP packets and echo them back - maybe with an epoch timestamp?)
  2. Client side code - I'm thinking that we both need to do a C# and a C++ sdk for this, but also define it as a standard. That way, if can't use the supported client code, it can be integrated relatively easily (I hope) / can be developed in phases.

Does that make sense?

^ this should likely also be its own ticket at this point, it seems.

@victor-prodan
Copy link
Contributor

  1. Client side code - I'm thinking that we both need to do a C# and a C++ sdk for this

Do you mean game client code? If yes, I don't think it's necessary. I think that each game has its own way of pinging a HTTP (or maybe UDP) endpoint, so we just need a way to create those endpoints.

Here is how Amazon is doing it: https://www.cloudping.info/.
I imagine a similar thing for Agones.

@markmandel
Copy link
Member Author

Also similar to: http://www.gcping.com/ 😄

Is a HTTP endpoint good enough, or does it have to be a UDP endpoint? (or maybe both?)

Here's an interesting question - should this be behind a loadbalancer? A LB will mean that we can do redundancy, and scale up the pinger (if it's needed), but does a LB introduce a layer that otherwise wouldn't be there.

@victor-prodan
Copy link
Contributor

A LB is needed, yes, and this is why it sounds like an independent service hosted in the same cluster. Something like this already exists, maybe?

About udp... Yes, it might be helpful to detect packet loss for example.

IT also depends on the protocol used by the game... If it's based on smth like websocket than they wouldnt need udp.

Http is easier to implement by both sides and it's a must. Udp is nice to have.

@markmandel
Copy link
Member Author

That makes lots of sense.

For the HTTP request - I'm assuming this would need to return an "ok" and HTTP 200, and the client would want to track the round trip time themselves.

For UDP - would it return a empty packet back to the sender? Does it need to contain any information, like an id/hash? (or echo what has been sent) Since it's async, I would assume this would be required - unless there is a better way.

  • Another fun question - do we have any concerns about someone using this for a reflection DDOS attack of some kind? (spoofing the "from" address, to forward the attack to another location? Sounds like we should rate limit these requests).

@victor-prodan
Copy link
Contributor

For the HTTP request - I'm assuming this would need to return an "ok" and HTTP 200, and the client would want to track the round trip time themselves.

👍

For UDP - would it return a empty packet back to the sender? Does it need to contain any information, like an id/hash?

UDP is tricky, as the production would use its own protocol. Maybe it would be better to let the user supply the binary? They would also be responsible for any throttling in this case.

@EricFortin
Copy link
Collaborator

@markmandel

Another fun question - do we have any concerns about someone using this for a reflection DDOS attack of some kind? (spoofing the "from" address, to forward the attack to another location? Sounds like we should rate limit these requests).

For a reflection attack to be effective, you usually want to provoke a bigger response than the request you sent. If we simply echo back the packet we received, there is not much to gain from hitting us instead of the target directly.

Rate limiting is still a good thing though.

markmandel added a commit to markmandel/agones that referenced this issue Dec 1, 2018
Context: googleforgames#301

This creates a simple HTTP endpoint and/or a rate limited UDP echo service
to be able to easily do RTT latency tests from game clients, to multiple
Agones installs.
markmandel added a commit to markmandel/agones that referenced this issue Dec 1, 2018
Context: googleforgames#301

This creates a simple HTTP endpoint and/or a rate limited UDP echo service
to be able to easily do RTT latency tests from game clients, to multiple
Agones installs.
markmandel added a commit to markmandel/agones that referenced this issue Dec 1, 2018
Context: googleforgames#301

This creates a simple HTTP endpoint and/or a rate limited UDP echo service
to be able to easily do RTT latency tests from game clients, to multiple
Agones installs.
markmandel added a commit to markmandel/agones that referenced this issue Dec 1, 2018
Context: googleforgames#301

This creates a simple HTTP endpoint and/or a rate limited UDP echo service
to be able to easily do RTT latency tests from game clients, to multiple
Agones installs.
markmandel added a commit to markmandel/agones that referenced this issue Dec 5, 2018
Context: googleforgames#301

This creates a simple HTTP endpoint and/or a rate limited UDP echo service
to be able to easily do RTT latency tests from game clients, to multiple
Agones installs.
markmandel added a commit that referenced this issue Dec 6, 2018
Context: #301

This creates a simple HTTP endpoint and/or a rate limited UDP echo service
to be able to easily do RTT latency tests from game clients, to multiple
Agones installs.
@markmandel markmandel removed their assignment Jan 31, 2019
@markmandel
Copy link
Member Author

I think we can also close this one, since we have https://agones.dev/site/docs/advanced/multi-cluster-allocation/ @pooneh-m WDYT?

@markmandel markmandel added the stale Pending closure unless there is a strong objection. label Apr 22, 2021
@pooneh-m
Copy link
Contributor

Makes sense. I closed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones stale Pending closure unless there is a strong objection. wontfix Sorry, but we're not going to do that.
Projects
None yet
Development

No branches or pull requests

8 participants