Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking Change] Move GameServerAllocation to an API Extension Server. #600

Closed

Conversation

markmandel
Copy link
Member

This moves the implementation of GameServerAllocation (GSA) to a Kubernetes API Extension instead of using CRDs. This was essentially done for performance reasons, but to break it down:

  1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
  2. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    2. Validation now happens in the request.
  3. We can now do batching of requests for higher throughput (Use batching in GameServerAllocation controller to improve throughput. #536), since we control the entire http request.
  4. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, basedn feedback from this implementation.

The breaking changes are:

  1. GameServerAllocation's group is now allocation.agones.dev rather than stable.agones.dev,
    because a CRD group can't overlap with a api server.
  2. Since there is only the create verb for GSA, there is no get/list/watch options for GameServerAllocations - so no informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.

@markmandel markmandel added kind/cleanup Refactoring code, fixing up documentation, etc area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) kind/breaking Breaking change labels Feb 19, 2019
@markmandel markmandel added this to the 0.9.0 milestone Feb 19, 2019
@markmandel markmandel added the area/performance Anything to do with Agones being slow, or making it go faster. label Feb 19, 2019
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: ff60e4cd-6dc3-48f8-9f89-f2579f301f0b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/600/head:pr_600 && git checkout pr_600
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.8.0-da6ebac

markmandel added a commit to markmandel/agones that referenced this pull request Feb 20, 2019
Since googleforgames#600 is in the wings for next release, which is another breaking change,
let's keep FleetAllocation stable for another release and mark
GameServerAllocation as experimental to manage expectations appropriately.
@@ -18,10 +18,40 @@
{{- $cert := genSignedCert $cn nil (list $altName1 $altName2) 3650 $ca }}
{{- if .Values.agones.registerWebhooks }}
---
apiVersion: apiregistration.k8s.io/v1beta1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the APIService here, because it uses the same cert - but I don't know if "registerWebhooks" is still accurate, or if the file name "admissionregistration" is still accurate.

Thoughts on what should be done here?

Iinitial idea: rename files to "extensions.yaml" or something similiar. Maybe add an extra flag for the APIService that is seperate from the webhooks? agones.registerAPIService` ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why registerWebhooks is a parameter, if you don't Agones won't work ?

I think if you just rename the file to extensions.yaml that would do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically - because people sometimes have weird requirements for installations, and need an escape hatch 🤷‍♂️

The escape hatch it to disable a certain section, and then manually install the thing that's been turned off in the special way that their platform requires.

That's really about it -- have run into it a few times with some users I've worked with 1:1 where they have needed this for things i wouldn't otherwise have thought of - so better to have and not want, than not have and be blocked until next release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! moved to extensions.yaml, and made a new flag for the apiservice.

markmandel added a commit that referenced this pull request Feb 20, 2019
Since #600 is in the wings for next release, which is another breaking change,
let's keep FleetAllocation stable for another release and mark
GameServerAllocation as experimental to manage expectations appropriately.
@markmandel markmandel removed feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) labels Feb 21, 2019
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: c82c11ca-fb32-4e12-8a06-0e2f79bef8bd

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 85b3aac8-3ce5-4600-a96a-48e18431e7b6

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/600/head:pr_600 && git checkout pr_600
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.9.0-a387c95

@jkowalski
Copy link
Contributor

This is really the only API of the entire Agones that's going to be invoked at scale on a regular basis (up to 100s of QPS for popular games) and I think it makes sense to decouple it from Kubernetes API serving infra as much as possible.

Given that this API is primarily going to be consumed by match makers and since we're already rewriting it, can we make it explicitly gRPC based instead of Kubernetes-based?

This would allow us to:

I was under the impression that using K8s API serving infrastructure is not the most recommended approach, instead they suggest vanilla CRDs, which are not appropriate for us.

@markmandel
Copy link
Member Author

markmandel commented Feb 25, 2019

@jkowalski these are really good points - I'm definitely not against having both a k8s api (for nice in-development experience / people that want to reuse rbac, etc), and a grpc endpoint (for far higher throughput, wider scaling, etc).

It's my opinion that reusing the k8s api is the simpler option from an implementation perspective, and requires less work on our end (and less reinvention of the wheel, with security/rbac/etc) - so figure, start here, and see what kind of throughput we can get on it, and move forward from there.

We still get some of the things from above, but to break it down a little.

  • implement batching, pod affinity, prefetching, etc. for better allocation performance

We can definitely still do this - and share the logic between the gRPC implementation, and the k8s api one.

We could still do this as well, not seeing the blocker here either.

But like I said - start here, see how it goes, and iterate from there. I don't think we lose anything by having the k8s api, as we are not locked out from doing a gRPC based API as well.

Thoughts?

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 94d53214-4bf0-4a9c-ba6c-18c6b3fe1502

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 82f7485d-6382-4794-80b5-64275b1f4d49

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/600/head:pr_600 && git checkout pr_600
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.9.0-35fa8f4

This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can
       continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think
       we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.
1. Sets us up if we decide we also want to have an alternative (http and/or gRPC) endpoint for allocation, based
   on feedback from this implementation.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.

This also includes some libraries for building further api server extension points.
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: c851dc47-568d-4e41-8307-b96c0ef41f95

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/600/head:pr_600 && git checkout pr_600
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.9.0-307807b

@markmandel
Copy link
Member Author

Gentle bump on this 😃

@markmandel
Copy link
Member Author

I'm going to close this PR for now - and break it up into smaller chunks so it's easier to review. Stay tuned for newer, smaller PRs.

@markmandel markmandel closed this Mar 8, 2019
@markmandel markmandel deleted the feature/gsa-api-server branch March 28, 2019 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Anything to do with Agones being slow, or making it go faster. area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/breaking Breaking change kind/cleanup Refactoring code, fixing up documentation, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants