Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Structured Parameters (DRA) in Kueue #2941

Open
3 tasks done
kannon92 opened this issue Aug 30, 2024 · 16 comments
Open
3 tasks done

Support Structured Parameters (DRA) in Kueue #2941

kannon92 opened this issue Aug 30, 2024 · 16 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@kannon92
Copy link
Contributor

What would you like to be added:
Workloads should be able to use Resource Claims in their specs and Kueue should be aware of this when doing quota management.
Why is this needed:
Support DRA.
Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@kannon92 kannon92 added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 30, 2024
@kannon92
Copy link
Contributor Author

Hoping to maybe discuss first what has been done with DRA. I can help with a KEP once we have a path forward.

@alculquicondor @tenzen-y Have either of you looked into how Kueue would work with DRA resources?

https://github.com/kubernetes-sigs/dra-example-driver/tree/main/demo

has a list of a few examples of how structured parameters works.

---
apiVersion: v1
kind: Pod
metadata:
  namespace: gpu-test1
  name: pod1
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; sleep 9999"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimTemplateName: gpu.example.com

Containers would now have a claim and the resourceClaimTemplate would dictate exactly what is being requested. It seems that there may be some indirection Kueue would need to take to support this.

@alculquicondor
Copy link
Contributor

Nothing has been done for DRA :)

@alculquicondor
Copy link
Contributor

The only other related issue is #1538

@kannon92
Copy link
Contributor Author

Okay. I'm going to spend some cycles thinking about this then.
/assign

@tenzen-y
Copy link
Member

This looks great!
@kannon92 Which will you take the DRA? Classic DRA? or DRA with structured parameters?

@alculquicondor
Copy link
Contributor

Classic DRA is likely getting removed in the next few versions, so I wouldn't like to invest on it in Kueue.

@kannon92
Copy link
Contributor Author

Hey @tenzen-y. From the source (kubernetes/enhancements#3063 (comment)), Classic DRA may be dropped in 1.32 so I will go with structured parameters

@tenzen-y
Copy link
Member

I agree with you.
In that case, we can somehow take the exact resource amount from the ResourceSlice object.
Although, we maybe want to tackle to handle ResourceSlice object as the separate features.

It would be great if you could consider the future ResourceSlice collaboration in this design.
In other words, I don't want to block the possibility of collaborating with Kueue and ResourceSlice with this ResourceClaim feature.

@kannon92
Copy link
Contributor Author

/retitle Support Structured Parameters (DRA) in Kueue

@k8s-ci-robot k8s-ci-robot changed the title Support Resource Claims in Kueue Support Structured Parameters (DRA) in Kueue Aug 30, 2024
@kannon92
Copy link
Contributor Author

I think the API I use is what I need to research. Going to retitle as I don't have a clear direction yet.

@kannon92
Copy link
Contributor Author

So @tenzen-y you are saying that something like:

---
apiVersion: v1
kind: Namespace
metadata:
  name: gpu-test4

---
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaimTemplate
metadata:
  namespace: gpu-test4
  name: multiple-gpus
spec:
  spec:
    devices:
      requests:
      - name: gpus
        deviceClassName: gpu.example.com
        allocationMode: ExactCount
        count: 4

---
apiVersion: v1
kind: Pod
metadata:
  namespace: gpu-test4
  name: pod0
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; sleep 9999"]
    resources:
      claims:
      - name: gpus
  resourceClaims:
  - name: gpus
    resourceClaimTemplateName: multiple-gpus

In this case, if a resourceClaimTemplate is using an allocationMode of ExactCount, that may be easier to support.

@alculquicondor @tenzen-y I'm open to suggestions on how you want to tackle DRA support in Kueue. We could piece by piece it or we create a KEP walking through the different options?

@alculquicondor
Copy link
Contributor

cc @johnbelamaric @pohly

@tenzen-y
Copy link
Member

@kannon92 I'm not sure which approaches we should take here. I guess that DRA support is a big project, and we need to support all features step by step.
So, we want to evaluate in the KEP which features are supported in which iteration.

@kannon92
Copy link
Contributor Author

kannon92 commented Sep 3, 2024

So I am not exactly sure what how we want to limit DRA claims in a ClusterQueue. I was thinking that we could add a counts to the claims but ResourceClaims are namespace scoped.

We could require that ResoureClaims that Kueue needs to be aware of in ClusterQueue are cluster scoped and enforce that?

It does look like ResourceSlices are cluster scoped so that could be an option.

@tenzen-y
Copy link
Member

So I am not exactly sure what how we want to limit DRA claims in a ClusterQueue. I was thinking that we could add a counts to the claims but ResourceClaims are namespace scoped.

We could require that ResoureClaims that Kueue needs to be aware of in ClusterQueue are cluster scoped and enforce that?

It does look like ResourceSlices are cluster scoped so that could be an option.

That makes sense. Let's discuss it in the KEP.

@kannon92
Copy link
Contributor Author

@tenzen-y I drafted the main idea I have in the KEP. We can defer those discussions to there for now on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants