[Feature][Docs] Explain how to specify container command for head pod #651

kevin85421 · 2022-10-24T19:51:03Z

Search before asking

I had searched in the issues and found no similar feature requirement.

Description

Some users want to run additional commands when RayClusters are started. For example, a user wants to run ray start && python3 something.py when the cluster is started.

Users can use headGroupSpec.template.spec.command to set the command running in the head pod. However, there is a tricky behavior that may cause users to be confused. If headGroupSpec.template.spec.command does not contain the substring ray start, KubeRay will replace the command specified by users. See the following code snippet for more details.

kuberay/ray-operator/controllers/ray/common/pod.go

Lines 307 to 325 in e77b095

    
           if !strings.Contains(cmd, "ray start") { 
        
           	cont := concatenateContainerCommand(rayNodeType, rayStartParams, pod.Spec.Containers[rayContainerIndex].Resources) 
        
           	// replacing the old command 
        
           	pod.Spec.Containers[rayContainerIndex].Command = []string{"/bin/bash", "-lc", "--"} 
        
           	if cmd != "" { 
        
           		// If 'ray start' has --block specified, commands after it will not get executed. 
        
           		// so we need to put cmd before cont. 
        
           		args = fmt.Sprintf("%s && %s", cmd, cont) 
        
           	} else { 
        
           		args = cont 
        
           	} 
        
           	if !isRayStartWithBlock(rayStartParams) { 
        
           		// sleep infinity is used to keep the pod `running` after the last command exits, and not go into `completed` state 
        
           		args = args + " && sleep infinity" 
        
           	} 
        
           	pod.Spec.Containers[rayContainerIndex].Args = []string{args} 
        
           }

This software design will introduce more "unknown unknowns" for users, and thus we need:
(1) Update the document (this issue)
(2) Discuss whether remove this behavior and find its replacement.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

kevin85421 · 2022-10-24T19:52:49Z

cc @DmitriGekhtman

tgaddair · 2022-12-03T00:48:35Z

@kevin85421 I wonder if this could be added in a more explicit way. The current behavior of command is confusing, as it changes based on the presence or absence of a substring, and if the command is overwritten, the rayStartParams essentially become useless (and the user needs to implement this logic themselves).

Maybe there's room for something alongside the rayStartParams like a postStartCommand (which would be invalid if block is set) or similar?

DmitriGekhtman · 2022-12-03T01:11:49Z

There's some discussion about how to get something approximating the desired behavior with a post-start hook:
https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1669986058864489?thread_ts=1669647595.429959&cid=C02GFQ82JPM

kevin85421 · 2022-12-04T03:12:11Z

@kevin85421 I wonder if this could be added in a more explicit way. The current behavior of command is confusing, as it changes based on the presence or absence of a substring, and if the command is overwritten, the rayStartParams essentially become useless (and the user needs to implement this logic themselves).

Maybe there's room for something alongside the rayStartParams like a postStartCommand (which would be invalid if block is set) or similar?

I totally agreed with your arguments. In addition, I found a post-start hook (commands execute when the RayCluster is READY) is very important recently. The hook can combine both RayCluster creation and other commands as an atomic operation, and thus make KubeRay more idempotent. One of the use cases is to avoid #756.

cc @architkulkarni

kevin85421 · 2022-12-04T03:15:51Z

Add this issue to v0.5.0 release.

DmitriGekhtman · 2022-12-04T05:58:28Z

Yeah, moving the job submission into the entry point makes sense to me. I don't really see the point of decoupling cluster creation and job submission...
Except that it could be nice to submit a new job to an already running cluster...
But that doesn't work so well anyway due to lack of clean isolation between subsequent jobs.

kevin85421 · 2023-02-04T15:35:20Z

Related discussion: https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1675378764037199

kevin85421 · 2023-02-16T18:47:06Z

Related discussion: https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1676506490994229

Upgrade the priority to P0.

kevin85421 added the enhancement New feature or request label Oct 24, 2022

kevin85421 self-assigned this Oct 24, 2022

kevin85421 mentioned this issue Nov 3, 2022

Remove ray-cluster.without-block.yaml #675

Merged

4 tasks

DmitriGekhtman added the P1 Issue that should be fixed within a few weeks label Dec 3, 2022

kevin85421 added this to the v0.5.0 release milestone Dec 4, 2022

kevin85421 added P0 Critical issue that should be fixed ASAP and removed P1 Issue that should be fixed within a few weeks labels Feb 16, 2023

This was referenced Feb 17, 2023

[Feature][Docs] Explain how to specify container command for head pod #912

Merged

[Feature] Update the logic of specifying container commands for head Pod #917

Closed

kevin85421 closed this as completed in #912 Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Docs] Explain how to specify container command for head pod #651

[Feature][Docs] Explain how to specify container command for head pod #651

kevin85421 commented Oct 24, 2022 •

edited

Loading

kevin85421 commented Oct 24, 2022

tgaddair commented Dec 3, 2022

DmitriGekhtman commented Dec 3, 2022

kevin85421 commented Dec 4, 2022 •

edited

Loading

kevin85421 commented Dec 4, 2022

DmitriGekhtman commented Dec 4, 2022

kevin85421 commented Feb 4, 2023

kevin85421 commented Feb 16, 2023

[Feature][Docs] Explain how to specify container command for head pod #651

[Feature][Docs] Explain how to specify container command for head pod #651

Comments

kevin85421 commented Oct 24, 2022 • edited Loading

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

kevin85421 commented Oct 24, 2022

tgaddair commented Dec 3, 2022

DmitriGekhtman commented Dec 3, 2022

kevin85421 commented Dec 4, 2022 • edited Loading

kevin85421 commented Dec 4, 2022

DmitriGekhtman commented Dec 4, 2022

kevin85421 commented Feb 4, 2023

kevin85421 commented Feb 16, 2023

kevin85421 commented Oct 24, 2022 •

edited

Loading

kevin85421 commented Dec 4, 2022 •

edited

Loading