Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Docs] Explain how to specify container command for head pod #651

Closed
2 tasks done
kevin85421 opened this issue Oct 24, 2022 · 8 comments · Fixed by #912
Closed
2 tasks done

[Feature][Docs] Explain how to specify container command for head pod #651

kevin85421 opened this issue Oct 24, 2022 · 8 comments · Fixed by #912
Assignees
Labels
enhancement New feature or request P0 Critical issue that should be fixed ASAP

Comments

@kevin85421
Copy link
Member

kevin85421 commented Oct 24, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Some users want to run additional commands when RayClusters are started. For example, a user wants to run ray start && python3 something.py when the cluster is started.

Users can use headGroupSpec.template.spec.command to set the command running in the head pod. However, there is a tricky behavior that may cause users to be confused. If headGroupSpec.template.spec.command does not contain the substring ray start, KubeRay will replace the command specified by users. See the following code snippet for more details.

if !strings.Contains(cmd, "ray start") {
cont := concatenateContainerCommand(rayNodeType, rayStartParams, pod.Spec.Containers[rayContainerIndex].Resources)
// replacing the old command
pod.Spec.Containers[rayContainerIndex].Command = []string{"/bin/bash", "-lc", "--"}
if cmd != "" {
// If 'ray start' has --block specified, commands after it will not get executed.
// so we need to put cmd before cont.
args = fmt.Sprintf("%s && %s", cmd, cont)
} else {
args = cont
}
if !isRayStartWithBlock(rayStartParams) {
// sleep infinity is used to keep the pod `running` after the last command exits, and not go into `completed` state
args = args + " && sleep infinity"
}
pod.Spec.Containers[rayContainerIndex].Args = []string{args}
}

This software design will introduce more "unknown unknowns" for users, and thus we need:
(1) Update the document (this issue)
(2) Discuss whether remove this behavior and find its replacement.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421 kevin85421 added the enhancement New feature or request label Oct 24, 2022
@kevin85421 kevin85421 self-assigned this Oct 24, 2022
@kevin85421
Copy link
Member Author

cc @DmitriGekhtman

@tgaddair
Copy link
Contributor

tgaddair commented Dec 3, 2022

@kevin85421 I wonder if this could be added in a more explicit way. The current behavior of command is confusing, as it changes based on the presence or absence of a substring, and if the command is overwritten, the rayStartParams essentially become useless (and the user needs to implement this logic themselves).

Maybe there's room for something alongside the rayStartParams like a postStartCommand (which would be invalid if block is set) or similar?

@DmitriGekhtman
Copy link
Collaborator

There's some discussion about how to get something approximating the desired behavior with a post-start hook:
https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1669986058864489?thread_ts=1669647595.429959&cid=C02GFQ82JPM

@DmitriGekhtman DmitriGekhtman added the P1 Issue that should be fixed within a few weeks label Dec 3, 2022
@kevin85421
Copy link
Member Author

kevin85421 commented Dec 4, 2022

@kevin85421 I wonder if this could be added in a more explicit way. The current behavior of command is confusing, as it changes based on the presence or absence of a substring, and if the command is overwritten, the rayStartParams essentially become useless (and the user needs to implement this logic themselves).

Maybe there's room for something alongside the rayStartParams like a postStartCommand (which would be invalid if block is set) or similar?

I totally agreed with your arguments. In addition, I found a post-start hook (commands execute when the RayCluster is READY) is very important recently. The hook can combine both RayCluster creation and other commands as an atomic operation, and thus make KubeRay more idempotent. One of the use cases is to avoid #756.

cc @architkulkarni

@kevin85421 kevin85421 added this to the v0.5.0 release milestone Dec 4, 2022
@kevin85421
Copy link
Member Author

Add this issue to v0.5.0 release.

@DmitriGekhtman
Copy link
Collaborator

Yeah, moving the job submission into the entry point makes sense to me. I don't really see the point of decoupling cluster creation and job submission...
Except that it could be nice to submit a new job to an already running cluster...
But that doesn't work so well anyway due to lack of clean isolation between subsequent jobs.

@kevin85421
Copy link
Member Author

@kevin85421 kevin85421 added P0 Critical issue that should be fixed ASAP and removed P1 Issue that should be fixed within a few weeks labels Feb 16, 2023
@kevin85421
Copy link
Member Author

Related discussion: https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1676506490994229

Upgrade the priority to P0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P0 Critical issue that should be fixed ASAP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants