-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Ray start/stop/start does not work when using a custom temporary folder #27021
Labels
bug
Something that is supposed to be working; but isn't
core
Issues that should be addressed in Ray Core
release-blocker
P0 Issue that blocks the release
Comments
jbedorf
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jul 26, 2022
@scv119 @stephanie-wang seems a release blocker since it's a regression. |
jjyao
added
core
Issues that should be addressed in Ray Core
release-blocker
P0 Issue that blocks the release
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Aug 1, 2022
Closed
7 tasks
7 tasks
stephanie-wang
added a commit
that referenced
this issue
Aug 9, 2022
Signed-off-by: Stephanie Wang [email protected] Cluster address is now written to a temp file. Previously we raised an error if ray start --head tried to reuse the old cluster address in the temp file, even if Ray was no longer running. This PR allows ray start --head to continue if it can't find any GCS process associated with the recorded cluster address. Related issue number Closes #27021.
scv119
pushed a commit
that referenced
this issue
Aug 10, 2022
Signed-off-by: Stephanie Wang [email protected] Cluster address is now written to a temp file. Previously we raised an error if ray start --head tried to reuse the old cluster address in the temp file, even if Ray was no longer running. This PR allows ray start --head to continue if it can't find any GCS process associated with the recorded cluster address. Related issue number Closes #27021.
scv119
pushed a commit
that referenced
this issue
Aug 10, 2022
Signed-off-by: Stephanie Wang [email protected] Cluster address is now written to a temp file. Previously we raised an error if ray start --head tried to reuse the old cluster address in the temp file, even if Ray was no longer running. This PR allows ray start --head to continue if it can't find any GCS process associated with the recorded cluster address. Related issue number Closes #27021.
Stefan-1313
pushed a commit
to Stefan-1313/ray_mod
that referenced
this issue
Aug 18, 2022
…ect#27666) Signed-off-by: Stephanie Wang [email protected] Cluster address is now written to a temp file. Previously we raised an error if ray start --head tried to reuse the old cluster address in the temp file, even if Ray was no longer running. This PR allows ray start --head to continue if it can't find any GCS process associated with the recorded cluster address. Related issue number Closes ray-project#27021. Signed-off-by: Stefan van der Kleij <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something that is supposed to be working; but isn't
core
Issues that should be addressed in Ray Core
release-blocker
P0 Issue that blocks the release
What happened + What you expected to happen
The changes introduced here have as side effect that the following logic no longer works:
Start a cluster with a custom temporary folder:
ray start --head --temp-dir /tmp/bla
Now stop any active ray cluster using the typical:
ray stop
Now there is nothing running anymore, but when you next run:
ray start --head --temp-dir /tmp/bla
You get this error:
ConnectionError: Ray is trying to start at 192.168.XXX.XXX:6379, but is already running at 192.168.XXX.XXX:6379.
This due to the file
ray_current_cluster
not being deleted when using a non-default temporary directory. Thestop
command does not allow you to specify thetemp-dir
and as such it will not findray_current_cluster
file. But when you then try to start ray again it will fail due to the file still being there. You can work around this by manually deleting the file or specifying a different port.Versions / Dependencies
Nightly
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: