Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayJob] Set Resources field for job submitter pod #1198

Closed
architkulkarni opened this issue Jun 28, 2023 · 1 comment · Fixed by #1319
Closed

[RayJob] Set Resources field for job submitter pod #1198

architkulkarni opened this issue Jun 28, 2023 · 1 comment · Fixed by #1319
Assignees

Comments

@architkulkarni
Copy link
Contributor

Currently the Resources field for the pod which calls ray job submit is not set. We should set reasonable defaults for this field.

Original context reproduced below:

          > If we reduce the memory and CPU limit of the submitter pod, I think it will make the CI test less flaky, but it might be worse for users. Thoughts?

a38a22a => Oh, the GitHub Actions free plan runner only has 2 CPUs, and these 2 CPUs are not fully dedicated to Kubernetes. When we set the CPU resource request to 1 CPU, it makes sense for the test to fail.

Workaround solution:

(1) Remove CPU request and limit from this PR
(2) Open issues to make sure: (a) add CPU configs back before v0.6.0 (b) move the RayJob tests to Buildkite before v0.6.0

Originally posted by @kevin85421 in #1177 (comment)

@architkulkarni architkulkarni added the v0.6.0 Must be included in v0.6.0. label Jun 28, 2023
@architkulkarni
Copy link
Contributor Author

This issue is blocked by #1199.

@kevin85421 kevin85421 added 1.0 and removed v0.6.0 Must be included in v0.6.0. labels Sep 15, 2023
architkulkarni added a commit that referenced this issue Oct 5, 2023
Adds a default CPU and memory resource requirement for the pod that submits the RayJob.

image
Here's the memory and CPU usage of the job submitter pod with a job entrypoint script that prints "hello world" in a tight loop from Python. I tested it on a local kind cluster. Since the job submitter pod just calls ray job submit which runs the job remotely and streams the job output to stdout on the submitter pod, this kind of log streaming job should be the most stressful from the perspective of the job submitter pod.

Based on this, we set the requests to 500m CPU and 200MiB memory, and set the limits to 1 CPU and 1GiB memory.

Why are these changes needed?
It's good to provide reasonable defaults to reduce the burden on the user.

Related issue number
Closes #1198

---------

Signed-off-by: Archit Kulkarni <[email protected]>
kevin85421 pushed a commit to kevin85421/kuberay that referenced this issue Oct 17, 2023
…t#1319)

Adds a default CPU and memory resource requirement for the pod that submits the RayJob.

image
Here's the memory and CPU usage of the job submitter pod with a job entrypoint script that prints "hello world" in a tight loop from Python. I tested it on a local kind cluster. Since the job submitter pod just calls ray job submit which runs the job remotely and streams the job output to stdout on the submitter pod, this kind of log streaming job should be the most stressful from the perspective of the job submitter pod.

Based on this, we set the requests to 500m CPU and 200MiB memory, and set the limits to 1 CPU and 1GiB memory.

Why are these changes needed?
It's good to provide reasonable defaults to reduce the burden on the user.

Related issue number
Closes ray-project#1198

---------

Signed-off-by: Archit Kulkarni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants