Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Customize KubeRay container commands #41651

Merged
merged 3 commits into from
Dec 8, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Dec 6, 2023

Why are these changes needed?

Update doc for ray-project/kuberay#1704.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@kevin85421 kevin85421 force-pushed the custom-ray-start-cmd branch 2 times, most recently from 2a51fdd to 6698130 Compare December 7, 2023 16:37
Signed-off-by: Kai-Hsun Chen <[email protected]>
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments and questions

## Current KubeRay operator behavior for container commands
* The current behavior for container commands is not finalized, and **may be updated in the future**.
* See [code](https://github.com/ray-project/kuberay/blob/47148921c7d14813aea26a7974abda7cf22bbc52/ray-operator/controllers/ray/common/pod.go#L301-L326) for more details.
Starting with KubeRay v1.1.0, if users add the annotation `ray.io/overwrite-container-cmd: "true"` to a RayCluster, KubeRay respects the container `command` and `args` as provided by the users, without including any generated command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Starting with KubeRay v1.1.0, if users add the annotation `ray.io/overwrite-container-cmd: "true"` to a RayCluster, KubeRay respects the container `command` and `args` as provided by the users, without including any generated command.
Starting with KubeRay v1.1.0, if users add the annotation `ray.io/overwrite-container-cmd: "true"` to a RayCluster, KubeRay respects the container `command` and `args` as provided by the users, without including the generated `ray start` command.

Nit. Just to be a bit more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be helpful to include the definition/explanation of KUBERAY_GEN_RAY_START_CMD here, in the text, instead of just in the code comment (users might not expect to see a key definition in the code comment instead of the main text)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not precise enough. The generated commands include ulimit and ray start commands. I add more details about both of them.


* `metadata.annotations.ray.io/overwrite-container-cmd: "true"`: This annotation tells KubeRay to respect the container `command` and `args` as provided by the users, without including any generated command.

* `ulimit -n 65536`: This command is necessary to avoid Ray scalability issues caused by running out of file descriptors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot (and the reader might be curious), why didn't we just make ulimit -n 65536 part of the KUBERAY_GEN_RAY_START_CMD? Is it because some users might want to delete the ulimit part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an interesting explanation for the above question it might be helpful to write it in the doc, maybe here or at the line below where we say "Note that this environment variable doesn't include the ulimit command."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why didn't we just make ulimit -n 65536 part of the KUBERAY_GEN_RAY_START_CMD? Is it because some users might want to delete the ulimit part?

Some users want to run some custom commands between ulimit and ray start. ray-project/kuberay#1560 (comment) is an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I forgot. Makes sense

@@ -63,91 +113,3 @@ Currently, for timing (1), we can set the container's `Command` and `Args` in Ra
# Args:
# echo 123 456 && ulimit -n 65536; ray start --head --dashboard-host=0.0.0.0 --num-cpus=1 --block --metrics-export-port=8080 --memory=2147483648
```


## Timing 2: After `ray start` (RayCluster is ready)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to think a bit to remember that the reason we don't need this section anymore is because we can use Part 1 to just directly add post-start commands after the KUBERAY_GEN_RAY_START_COMMAND.

I think it would help to make this really obvious and explicit in the doc for part 1, maybe just with a quick sentence or example. Or maybe just put a post start command in the existing example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an over-explanation for me. I will update it only if you feel strongly about its necessity.

kevin85421 and others added 2 commits December 7, 2023 17:15
Co-authored-by: Archit Kulkarni <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
@architkulkarni architkulkarni merged commit 25d16d3 into ray-project:master Dec 8, 2023
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants