Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hierarchical Cohorts] Define Cohort API #2693

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

gabesaba
Copy link
Contributor

@gabesaba gabesaba commented Jul 25, 2024

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

Define Cohort API #79

Special notes for your reviewer:

Webhook and Reconciliation logic will be defined in follow-up PRs, to keep this PR small.

Does this PR introduce a user-facing change?

Hierarchical Cohorts, introduced with the v1alpha1 Cohorts API, allow users to group resources in an arbitrary tree structure. Additionally, quotas and limits can now be defined directly at the Cohort level. See #79 for more details.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 25, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 25, 2024
Copy link

netlify bot commented Jul 25, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 28c785b
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66a248cb1668a4000953f88c
😎 Deploy Preview https://deploy-preview-2693--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

//+kubebuilder:validation:MaxLength=253
//+kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$"
//
Parent string `json:"parent,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be a pointer, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CQ also defines like this

Cohort string `json:"cohort,omitempty"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, an empty string plays the role of no parent. That's fine, consistency is important.

//+kubebuilder:resource:scope=Cluster

// Cohort is the Schema for the cohorts API
type Cohort struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer to have some prototype or at least minimal functionality implemented before merging the API. It would increase the confidence in the API. Still, we can merge it as a separate PR, but would be good to see the implementation more on the horizon. WDYT @tenzen-y ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is possible, but my intention was to keep the PRs as small as possible for reviewability. Also, this is in v1alpha1, and not yet cut into a minor release, so I'd argue we can change it freely for some time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to select either approach

  1. a single big bung PR containing all implementations and APIs.
  2. multiple small PRs, but API changes PR are merged in the final phase.

TBH, I would prefer to opt 2 since it's challenging to review the opt 1 PR.
In the case of opt 2, we should expose APIs in the last phase since it is better not to expose the unimplemented APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need this type for development of the rest of the features - in this case, should I just define it in pkg for now, and then move it to api later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... my intention was to keep the PRs as small as possible for reviewability.

That's for sure, I was also thinking about merging this PR separately, but once seeing some PoC implementation to increase the confidence the API can be released.

Also, this is in v1alpha1

Good point, and the first iteration of the API looks quite minimal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api was discussed and merged in https://github.com/kubernetes-sigs/kueue/tree/main/keps/79-hierarchical-cohorts. I'm not sure if adding basic implementation to this PR (that has very little to do with the API - it is mostly about scheduling and fitting the workloads) would make it more convincing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that you can learn how we should go here: #1714

Yeah, with the caveat that fair sharing for API only introduced fields rather than API. For a feature which requires new API it might be harder to develop it's logic without that API merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api was discussed and merged in https://github.com/kubernetes-sigs/kueue/tree/main/keps/79-hierarchical-cohorts.

Indeed, together with the fact this is still alpha API to reduce the burden of rebases I would be leaning to merge it.

The only downside is that in case we need to release 0.9 urgently we will release hollow alpha API.Are we good with this @mwielgus @tenzen-y ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside is that in case we need to release 0.9 urgently we will release hollow alpha API.Are we good with this @mwielgus @tenzen-y ?

That is my primary concern, as I mentioned above.

In the case of opt 2, we should expose APIs in the last phase since it is better not to expose the unimplemented APIs.

In the case of urgently minor release, let's revert all PRs related to Hierarchical Cohorts...
I hope that we never face the situation...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of urgently minor release, let's revert all PRs related to Hierarchical Cohorts... I hope that we never face the situation...

I guess it will be dependent on the completeness of the feature at the moment of releasing 0.9.

I synced with @gabesaba and we are ok to rollback the PRs related to the new API if the feature is still vastly unfinished when doing 0.9.

// CohortSpec defines the desired state of Cohort
type CohortSpec struct {
// Parent references the name of the Cohort's parent, if
// any. It satisfies one of three cases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a cycle. What happens in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disable all members of the Cohort graph. I updated documentation.

//
// BorrowingLimit limits how much members of this Cohort
// subtree can borrow from the parent subtree. This limit must
// only be set when the Cohort has a parent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens otherwise? We catch it at validation phase, invalidate the cohort or let it be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be validated by the webhook, and we will reject the create/update. Updated the documentation.

}

// CohortStatus defines the observed state of Cohort
type CohortStatus struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect it to be left empty? If yes - drop the struct for now. If not - please add the expected content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the Conditions field, and the CohortActive field. One note: I modified the condition from the KEP slightly, to match ClusterQueue

KEP

CohortActive = "CohortActive"

This PR

CohortActive = "Active"

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 25, 2024
@mimowo
Copy link
Contributor

mimowo commented Jul 25, 2024

/lgtm
/approve
Based on the discussion in #2693 (comment)
I believe all comments are addressed, if you have more remarks @tenzen-y or @mwielgus we can address in a follow up. I believe the API will make it easier for @gabesaba in the meanwhile to work on the changes on top of it.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ccee3899d6dcb722430ab44b1a0a1b7f90a9ffe5

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabesaba, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2024
@k8s-ci-robot k8s-ci-robot merged commit 176e1dd into kubernetes-sigs:main Jul 25, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Jul 25, 2024
@gabesaba gabesaba deleted the cohort_api branch July 26, 2024 07:42
@gabesaba
Copy link
Contributor Author

thanks for the reviews!

@gabesaba
Copy link
Contributor Author

/release-note-edit

Hierarchical Cohorts, introduced with the v1alpha1 Cohorts API, allow users to group resources in an arbitrary tree structure. Additionally, quotas and limits can now be defined directly at the Cohort level. See #79 for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants