-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Structured Parameters (DRA) in Kueue #2941
Comments
Hoping to maybe discuss first what has been done with DRA. I can help with a KEP once we have a path forward. @alculquicondor @tenzen-y Have either of you looked into how Kueue would work with DRA resources? https://github.com/kubernetes-sigs/dra-example-driver/tree/main/demo has a list of a few examples of how structured parameters works. ---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu-test1
name: pod1
labels:
app: pod
spec:
containers:
- name: ctr0
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; sleep 9999"]
resources:
claims:
- name: gpu
resourceClaims:
- name: gpu
resourceClaimTemplateName: gpu.example.com Containers would now have a claim and the resourceClaimTemplate would dictate exactly what is being requested. It seems that there may be some indirection Kueue would need to take to support this. |
Nothing has been done for DRA :) |
The only other related issue is #1538 |
Okay. I'm going to spend some cycles thinking about this then. |
This looks great! |
Classic DRA is likely getting removed in the next few versions, so I wouldn't like to invest on it in Kueue. |
Hey @tenzen-y. From the source (kubernetes/enhancements#3063 (comment)), Classic DRA may be dropped in 1.32 so I will go with structured parameters |
I agree with you. It would be great if you could consider the future ResourceSlice collaboration in this design. |
/retitle Support Structured Parameters (DRA) in Kueue |
I think the API I use is what I need to research. Going to retitle as I don't have a clear direction yet. |
So @tenzen-y you are saying that something like: ---
apiVersion: v1
kind: Namespace
metadata:
name: gpu-test4
---
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaimTemplate
metadata:
namespace: gpu-test4
name: multiple-gpus
spec:
spec:
devices:
requests:
- name: gpus
deviceClassName: gpu.example.com
allocationMode: ExactCount
count: 4
---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu-test4
name: pod0
labels:
app: pod
spec:
containers:
- name: ctr0
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; sleep 9999"]
resources:
claims:
- name: gpus
resourceClaims:
- name: gpus
resourceClaimTemplateName: multiple-gpus In this case, if a resourceClaimTemplate is using an allocationMode of ExactCount, that may be easier to support. @alculquicondor @tenzen-y I'm open to suggestions on how you want to tackle DRA support in Kueue. We could piece by piece it or we create a KEP walking through the different options? |
@kannon92 I'm not sure which approaches we should take here. I guess that DRA support is a big project, and we need to support all features step by step. |
So I am not exactly sure what how we want to limit DRA claims in a ClusterQueue. I was thinking that we could add a counts to the claims but ResourceClaims are namespace scoped. We could require that ResoureClaims that Kueue needs to be aware of in ClusterQueue are cluster scoped and enforce that? It does look like ResourceSlices are cluster scoped so that could be an option. |
That makes sense. Let's discuss it in the KEP. |
@tenzen-y I drafted the main idea I have in the KEP. We can defer those discussions to there for now on. |
What would you like to be added:
Workloads should be able to use Resource Claims in their specs and Kueue should be aware of this when doing quota management.
Why is this needed:
Support DRA.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: