Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support gang scheduling with Apache YuniKorn #2396
Support gang scheduling with Apache YuniKorn #2396
Changes from 9 commits
b561e0a
d72d73a
708b86f
1ceada8
2e962f9
4ee2405
f9b68db
08cc7b3
6a06417
6a3d044
aae020e
10119ed
3ea3b64
31ade1c
d932703
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why the label names need to be translated from
RayClusterApplicationIDLabelName
toYuniKornPodApplicationIDLabelName
and fromRayClusterQueueLabelName
toYuniKornPodQueueLabelName
.Does Yunikorn read the labels on resources other than Pods? If not, why not set
YuniKornPodApplicationIDLabelName
toyunikorn.apache.org/app-id
andYuniKornPodQueueLabelName
toyunikorn.apache.org/queue
and set them directly onRayCluster
, and then apply the same labels to all the Pods that controlled byRayCluster
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the caveat is, yunikorn depends on labels to identify apps, but they are currently using very simple format:
YuniKornPodApplicationIDLabelName
: applicationIdYuniKornPodQueueLabelName
: queuebut in Ray, I think it makes sense to add the domain name to the label, to indicate these lables are set for the scheduler.
the code logic is very simple, just translate these labels and set them on pods, without introducing any inconsistency in Ray cluster spec. Hope that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I viewed the doc here
https://yunikorn.apache.org/docs/next/user_guide/labels_and_annotations_in_yunikorn
there are two labels
yunikorn.apache.org/app-id
andyunikorn.apache.org/queue
. AndapplicationId
andqueue
labels are legacy now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch. Seems like the community is moving to a domain scoped label approach recently. Let me take a look, and do some tests, will update the PR once confims these labels works. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @MortalHappiness I just found these labels are only supported in v1.6 which was released just a few weeks ago. For better compatibility with some older version of yunikorn deployment, I think it's better to continue to support the legacy label names. However, I have modified
RayClusterApplicationIDLabelName
andRayClusterQueueLabelName
in the package to be the same as yunikorn upstream. We can remove this translation layer and directly populate the labels to pods after a few releases. Does this sound good?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about add a TODO comment here to let us know we need to remove this in the future release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yangwwei Would you mind briefly adding comments for the context you mentioned in https://github.com/ray-project/kuberay/pull/2396/files#r1777710831?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure