-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare socket file when start ray #3925
Conversation
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
python/ray/node.py
Outdated
Args: | ||
socket_path (string): the socket file to prepare. | ||
""" | ||
if not os.path.isfile(socket_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not os.path.exist(socket_path)
will be more explicit?
test/runtest.py
Outdated
@@ -2607,3 +2607,20 @@ def test_pandas_parquet_serialization(): | |||
pd.DataFrame({"col1": [0, 1], "col2": [0, 1]}).to_parquet(filename) | |||
# Clean up | |||
shutil.rmtree(tempdir) | |||
|
|||
|
|||
def test_socket_directory_none_existant(shutdown_only): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, existent
. or maybe shorter test_socket_dir_not_existing
test/runtest.py
Outdated
def test_socket_directory_none_existant(shutdown_only): | ||
level1_name = ray.ObjectID(_random_string()).hex() | ||
level2_name = ray.ObjectID(_random_string()).hex() | ||
temp_raylet_socket_dir = "/tmp/{}/{}".format(level1_name, level2_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, let's put the socket under something like /tmp/ray/tests
to avoid creating too many files in /tmp
python/ray/node.py
Outdated
if not os.path.isdir(path): | ||
try_to_create_directory(path) | ||
else: | ||
os.remove(socket_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's a good idea to remove the existing socket file. If a raylet is still running and user starts ray again, this will break the previous raylet.
We can check if this socket file is still used by any processes before removing it. Or maybe simply raise an error if it already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this function is used in raylet socket and plasma socket, it not easy to connect to it to check whether it is in use. I raise exception here. It is better than nothing. If we don't do this and start cluster using ray start
, there will be no error after this command. We need to check it in raylet.err. With this exception, ray start
will stop immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, raising the error earlier is better.
BTW, we can detect if a socket file is in use with the lsof
command. But raising an error is okay as well.
Test PASSed. |
Test PASSed. |
@robertnishihara The Linux Wheel test fails in all PRs. Shall we change the number of expecting wheels to 4 to unblock the test? |
What do these changes do?
Sometimes, users want to specify their own socket file name. However, there are two cases that they could fail.
For the first case, raylet will crash with:
For the second case, raylet will crash with:
Related issue number