-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][autoscaler] Add autoscaler container overrides and config options for scale behavior. #278
[RFC][autoscaler] Add autoscaler container overrides and config options for scale behavior. #278
Conversation
I am thinking whether we should bump the version to v1alpha2 this time. Adding new fields to existing CRD won't bring breaking issue so v1alpha1 works as well. Bumping version introduce additional work like cr mutation etc. what do you guys think? |
On the question of bumping, why don't we bump it when we make incompatible changes (which I'm pretty sure will happen in the future before KubeRay becomes generally available)? This way we can avoid people having to do CR mutation multiple times :) |
This PR looks good to me to make it more convenient for people to use Ray's autoscaler. It won't help people who want to bring their own autoscaler, but they can do so in the classical way (by providing a pod or even an external component that KubeRay is not aware of at all). |
description: UpscalineMode is "Default" or "Aggressive. | ||
enum: | ||
- Default | ||
- Aggressive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just wondering, where is the upscaling_speed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allowed values of "Default" and "Aggressive" will translate into an upscaling_speed
of 1 and 1000, respectively.
(1000 is a stand-in for infinity.)
The reason is that we've never seen anyone have a need for anything strictly between 1 and infinity :)
An upcoming Ray PR will have the autoscaler parse the new field.
Yep, this is the idea. There's also the option of embedding a custom sidecar yourself. Another possibility down the line is to expose additional fields container fields here, besides resource and image. I considered inserting a whole container spec's worth of overrides into the CRD (some CRDs do that)...then I started thinking about how to merge with the default autoscaler container...then I found myself in a Go reflection rabbit hole and decided to go with a simpler approach. |
Eh, yeah, +1 for the lazy bump policy |
…uberay/ray-operator/bin/controller-gen "crd:maxDescLen=100,trivialVersions=true,preserveUnknownFields=false,generateEmbeddedObjectMeta=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases /Users/dmitrigekhtman/code/kuberay/ray-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..." go fmt ./... go vet ./... mkdir -p /Users/dmitrigekhtman/code/kuberay/ray-operator/testbin test -f /Users/dmitrigekhtman/code/kuberay/ray-operator/testbin/setup-envtest.sh || curl -sSLo /Users/dmitrigekhtman/code/kuberay/ray-operator/testbin/setup-envtest.sh https://raw.githubusercontent.com/kubernetes-sigs/controller-runtime/v0.7.2/hack/setup-envtest.sh source /Users/dmitrigekhtman/code/kuberay/ray-operator/testbin/setup-envtest.sh; fetch_envtest_tools /Users/dmitrigekhtman/code/kuberay/ray-operator/testbin; setup_envtest_env /Users/dmitrigekhtman/code/kuberay/ray-operator/testbin; go test ./... -coverprofile cover.out �[1;33mUsing cached envtest tools from /Users/dmitrigekhtman/code/kuberay/ray-operator/testbin�[0m �[1;33msetting up env vars�[0m ? github.com/ray-project/kuberay/ray-operator [no test files] ok github.com/ray-project/kuberay/ray-operator/apis/ray/v1alpha1 0.012s coverage: 0.7% of statements ok github.com/ray-project/kuberay/ray-operator/controllers/ray 13.268s coverage: 61.6% of statements ok github.com/ray-project/kuberay/ray-operator/controllers/ray/common 0.015s coverage: 75.6% of statements ok github.com/ray-project/kuberay/ray-operator/controllers/ray/utils 0.012s coverage: 32.8% of statements ? github.com/ray-project/kuberay/ray-operator/pkg/client/clientset/versioned [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/clientset/versioned/fake [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/clientset/versioned/scheme [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/clientset/versioned/typed/raycluster/v1alpha1 [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/clientset/versioned/typed/raycluster/v1alpha1/fake [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/informers/externalversions [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/informers/externalversions/internalinterfaces [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/informers/externalversions/raycluster [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/informers/externalversions/raycluster/v1alpha1 [no test files] ? github.com/ray-project/kuberay/ray-operator/pkg/client/listers/raycluster/v1alpha1 [no test files].
I manually tested to confirm the new fields work as expected. |
anyOf: | ||
- type: integer | ||
- type: string | ||
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an explanation for this pattern? Same for line 69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, these are the auto-generated internals of a K8s container's resource field.
The readable source of truth is in raycluster_types.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern is a regex encoding the rules for specifying resource quantities, as described in the K8s docs:
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-units-in-kubernetes
…ns for scale behavior. (ray-project#278) This PR adds an autoscalerOptions field to the RayCluster CRD.
Overview
Closes #246. Takes a simpler and flatter config approach than the one suggested in the RFC.
This PR is a draft. Once we agree on the config changes, I'll add a simple test.Added a unit test.
Documentation will be added in a future PR -- after the autoscaler is modified to use the relevant new fields.
Details
This PR adds an
autoscalerOptions
field to the RayCluster CRD.autoscalerOptions
has the following subfields.upscalingMode
An upcoming Ray PR will have the autoscaler translate this field into
upscaling_speed
. The allowed values ofDefault
andAggressive
will translate into anupscaling_speed
of 1 and 1000, respectively.idleTimeoutSeconds
An upcoming Ray PR will have the autoscaler translate this field into
idle_timeout_minutes
. Seconds are used because timeouts of under a minute are sometimes useful (especially in test environments). On the other hand, there are issues with using floats in CRDs.resources
Override for the autoscaler container's resource requests. For large clusters, users should monitor the autoscaler's resource usage and set this as needed. (TODO: Run some benchmarks and document the results.)
image
Override for the image used by the autoscaler container. The main application I have in mind for this is testing autoscaler changes in the Ray repo's CI.
Related issue number
Checks