Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayJob] Set the timeout of the HTTP client from 2 mins to 2 seconds #1910

Merged
merged 1 commit into from
Feb 6, 2024

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Feb 6, 2024

Why are these changes needed?

Before #1733, the Ray dashboard required approximately 5 seconds to be ready to serve requests after the Ray head was running and ready. Hence, #1000 increases the timeout of the HTTP client to 2 mins to hotfix this issue. However, if the Ray dashboard crashes, the KubeRay operator will stuck there for 2 mins which is not acceptable.

After #1733, the readiness probe will verify the readiness of the Ray dashboard. Consequently, if the head Pod is ready, it implies that the Ray dashboard is also prepared to serve requests. Hence, we can set the timeout from 2 mins back to 2 seconds.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Deploy RayJob with this YAML 25 times.

Screen Shot 2024-02-05 at 8 28 14 PM

@kevin85421 kevin85421 marked this pull request as ready for review February 6, 2024 04:51
@kevin85421
Copy link
Member Author

cc @andrewsykim

@kevin85421
Copy link
Member Author

Thanks to @Irvingwangjr for reminding me of this issue!

@Irvingwangjr
Copy link

Irvingwangjr commented Feb 6, 2024

Thanks to @Irvingwangjr for reminding me of this issue!

image
We test it by using the memory_benmark_utils.py with slight modification, using the http-mode to submit the job after head Pod ReadinessProbe is ready.
We set RayJobConcurrency as 5, and RayClusterConcurrency as 15; submit 1000 RayJob with 5 seconds interval.
Everything looks great

@kevin85421 kevin85421 merged commit f9b2cb1 into ray-project:master Feb 6, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants