-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Reduce race condition between sequential job submission #592
Comments
I think this PR could help wrt to queuing and gang dispatching: #598 |
This is more a Ray issue than a KubeRay issue, but we can definitely discuss here. If I understand right, the concern is Ray-internal: It's hard to tell if the first Ray job is completely done before sending the second one to the same cluster. |
@DmitriGekhtman Yup that's me. Ray jobs supports concurrently running jobs and internally there's no notion of waiting for a job to finish before scheduling the next one. To do this with the Ray jobs SDK, you'd need to check the status in a loop until the first job returns a terminal status, like in the code sample here https://docs.ray.io/en/latest/cluster/running-applications/job-submission/sdk.html#submitting-a-ray-job. If using the Ray jobs CLI, |
@Jeffwan do you think polling for completed job is enough to enable sequential job submission? |
This seems to be a Ray issue rather than KubeRay issue. Close this issue. |
Search before asking
Description
When we submit jobs to existing cluster, there's an issue that the time job2 is created, job1 might not be fully deleted in the cluster.
It probably has two jobs running in the cluster at the same time. As a user, I want to submit job2 only if job1 is fully terminated.
/cc @Basasuya
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: