-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RayJob] Set Resources
field for job submitter pod
#1198
Comments
This issue is blocked by #1199. |
4 tasks
architkulkarni
added a commit
that referenced
this issue
Oct 5, 2023
Adds a default CPU and memory resource requirement for the pod that submits the RayJob. image Here's the memory and CPU usage of the job submitter pod with a job entrypoint script that prints "hello world" in a tight loop from Python. I tested it on a local kind cluster. Since the job submitter pod just calls ray job submit which runs the job remotely and streams the job output to stdout on the submitter pod, this kind of log streaming job should be the most stressful from the perspective of the job submitter pod. Based on this, we set the requests to 500m CPU and 200MiB memory, and set the limits to 1 CPU and 1GiB memory. Why are these changes needed? It's good to provide reasonable defaults to reduce the burden on the user. Related issue number Closes #1198 --------- Signed-off-by: Archit Kulkarni <[email protected]>
kevin85421
pushed a commit
to kevin85421/kuberay
that referenced
this issue
Oct 17, 2023
…t#1319) Adds a default CPU and memory resource requirement for the pod that submits the RayJob. image Here's the memory and CPU usage of the job submitter pod with a job entrypoint script that prints "hello world" in a tight loop from Python. I tested it on a local kind cluster. Since the job submitter pod just calls ray job submit which runs the job remotely and streams the job output to stdout on the submitter pod, this kind of log streaming job should be the most stressful from the perspective of the job submitter pod. Based on this, we set the requests to 500m CPU and 200MiB memory, and set the limits to 1 CPU and 1GiB memory. Why are these changes needed? It's good to provide reasonable defaults to reduce the burden on the user. Related issue number Closes ray-project#1198 --------- Signed-off-by: Archit Kulkarni <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently the
Resources
field for the pod which callsray job submit
is not set. We should set reasonable defaults for this field.Original context reproduced below:
a38a22a => Oh, the GitHub Actions free plan runner only has 2 CPUs, and these 2 CPUs are not fully dedicated to Kubernetes. When we set the CPU resource request to 1 CPU, it makes sense for the test to fail.
Workaround solution:
(1) Remove CPU request and limit from this PR
(2) Open issues to make sure: (a) add CPU configs back before v0.6.0 (b) move the RayJob tests to Buildkite before v0.6.0
Originally posted by @kevin85421 in #1177 (comment)
The text was updated successfully, but these errors were encountered: