Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy custom resources #336

Conversation

DmitriGekhtman
Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman commented Jul 3, 2022

Why are these changes needed?

Background

Closes #167

Ray start has a resources option which allows users to specify custom resource capacities for a Ray node.
Custom resources serve as hints to the Ray scheduler and autoscaler: If a Ray node has custom resource {"BLAH": 10}, you can schedule 10 tasks decorated with @ray.remote(resources={"BLAH": 1}). Attempting to schedule an 11th such task will cause the autoscaler to bring up another Ray node to accommodate the request.

One way to use the resources field in the context of KubeRay is described in #167:

A use case for this might be to deploy a heterogenous cluster with multiple worker groups, where each worker group uses a different image packaged with different 3rd-party utilities. Some tasks that require specific utilities could then be marked as requiring such resource, and only execute on the workers that provide it.

Problem

The value of resources must be supplied as a json string, which makes it awkward to specify correctly in a yaml file:

rayStartParams:
   resources: "'{\"customRes\": 1, \"anotherOne\": 2}'"

Solution

To provide a more convenient interface, this PR introduces an optional map field rayCustomResources.

Before:

rayStartParams:
   resources: "'{\"customRes\": 1, \"anotherOne\": 2}'"

After:

rayCustomResources:
    customRes: 1
    anotherOne: 2

Properties of the change in this PR

  • Purely cosmetic. Simply allows specifying Ray custom resources as a map rather than as a carefully quoted/escaped JSON string.
    • We can consider deeper integration with K8s extended resources later.
  • Backwards-compatible -- specifying rayStartParams["resources"] will continue to work.

An upcoming Ray PR would update the autoscaler to read the newly introduced field when determining node resource capacities.

Note on CPU/GPU/memory

The resources option for ray start does not support the keys CPU, GPU, or memory. These fields are inferred from the container spec and can be overridden with the optional ray start parameters num-cpus, num-gpus, memory.

Attempting to supply keys CPU, GPU, or memory to resources will cause Ray start to fail with an error message telling the user not to do that.

I considered adding CRD validation to exclude these keys. However, Kubernetes does not support "patternProperties" in CRDs -- you can't use a regex to restrict the keys of a map/object. A potential workaround would be to use a list of key, value pairs and add validation for key. However, that would make the interface and the code more complicated.

We already do some validation of rayStartParams in the operator code.
TODO in a follow-up PR: Validate absence of CPU, GPU, memory keys in the same spot, after #334 is merged.

Autoscaler

A follow-up Ray PR would have the autoscaler consider the rayCustomResources field when determining resource capacities.

Related issue number

Closes #167

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@@ -329,8 +331,6 @@ github.com/mitchellh/go-homedir v1.0.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrk
github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/mitchellh/go-testing-interface v1.0.0/go.mod h1:kRemZodwjscx+RGhAo8eIhFbs2+BFgRtFPeD/KE+zxI=
github.com/mitchellh/gox v0.4.0/go.mod h1:Sd9lOJ0+aimLBi73mGofS1ycjY8lL3uZM3JPS42BGNg=
github.com/mitchellh/hashstructure/v2 v2.0.2 h1:vGKWl0YJqUNxE8d+h8f6NJLcCJrgbhC4NcD46KavDd4=
github.com/mitchellh/hashstructure/v2 v2.0.2/go.mod h1:MG3aRVU/N29oo/V/IhBX8GR/zz4kQkprJgF2EVszyDE=
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this one was unused and got removed by go mod tidy

// capacities of Ray pods in this group.
// Equivalent to RayStartParams.resources, which must be specified as a json string.
// Disallowed keys: "GPU", "CPU", "memory", "object_store_memory". Instead, use rayStartParams["num-cpus"] etc.
RayCustomResources map[string]int64 `json:"rayCustomResources,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking in the real world cases, normally we need to claim the resource twice ( kubernetes level and ray level). for example, If I need an inference chip like aws.amazon.com/neuron: 1.

kubernetes level configuration

resources:
    limits:
       aws.amazon.com/neuron: 1

ray level configuration

rayCustomResources
     aws.amazon.com/neuron: 1

Is that expected? Not sure if the resource name are always same between kubernetes and ray. If so, we probably can use kubernetes resource as source of truth to build ray resource configurations. If not, I think separation is necessary. (I kind of forget our conclusion in last community meeting)

Copy link
Collaborator Author

@DmitriGekhtman DmitriGekhtman Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I'd prefer to keep the configuration separate -- this PR isn't introducing new behavior, just cleaning up the existing interface.
rayCustomResources is an alias for rayStartParams["resources"]. You use a map instead of a triply-quoted json string.

For the use-case described in #167 (multiple image types) it's not needed to set up a Kubernetes extended resource.
The Ray resource annotation is just used as a hint to the Ray scheduler to schedule certain tasks on workers using certain images.
For use-cases like this, it's too heavy-weight to require the user to patch nodes with extended resource capacities (or run a daemonset to do the same).

In the other direction, if your Ray pod already uses K8s extended resources, it could be useful for Ray to automatically detect those as "custom resources". Right now, we do that only with "/gpu" extended resources, but we might want to generalize the behavior. We could make this generalization later, if that's desired by users.

Copy link
Collaborator Author

@DmitriGekhtman DmitriGekhtman Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing my opinion:

  • If a pod uses a K8s extended resource, it may be useful to automatically advertise it to Ray as a "custom resource". (But we're not doing that in this PR.)

  • Advertising a "custom resource" to Ray should not require labelling K8s nodes with extended resources.

Copy link
Collaborator

@Jeffwan Jeffwan Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarize looks good to me. We can do it stage by stage

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jul 5, 2022

The change looks good to me. Please help rebase the master change

@akanso
Copy link
Collaborator

akanso commented Jul 5, 2022

@DmitriGekhtman do we have a use-case where the RayCustomResource is not a physical HW on the node (like a TPU or NPU)?

@DmitriGekhtman
Copy link
Collaborator Author

@DmitriGekhtman do we have a use-case where the RayCustomResource is not a physical HW on the node (like a TPU or NPU)?

Yeah, this one #167.

@DmitriGekhtman
Copy link
Collaborator Author

DmitriGekhtman commented Jul 5, 2022

Paraphrasing @pcmoritz from offline:
As we're a bit unsure where we're going with resource configuration, it may be best to just leave things as they are, and just document the format "'{\"customRes\": 1, \"anotherOne\": 2}'".

I think this probably makes sense -- the configuration is a little awkward, but the functionality is already present.

@DmitriGekhtman
Copy link
Collaborator Author

@DmitriGekhtman do we have a use-case where the RayCustomResource is not a physical HW on the node (like a TPU or NPU)?

Basically, any time you have multiple worker groups and you want certain tasks to be scheduled on a specific worker group.
The reason for scheduling on a given node group may or may not be related to physical hardware.

@DmitriGekhtman
Copy link
Collaborator Author

Will close for now and consider reopening if there are further complaints about the format.

@DmitriGekhtman
Copy link
Collaborator Author

@akanso as a related item, we can consider automatically advertising K8s extended resources to Ray by inserting those into the relevant rayStartParam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Users should be able to define custom resources in worker groups
6 participants