-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Make ray able to connect to redis without pip redis. #25875
Conversation
@wilsonwang371 Once we get this PR in, we don't need |
I'll remove enable_sharding_conn_ in next PR. |
@@ -751,14 +751,24 @@ test_minimal() { | |||
# shellcheck disable=SC2086 | |||
bazel test --test_output=streamed --config=ci --test_env=RAY_MINIMAL=1 ${BAZEL_EXPORT_OPTIONS} python/ray/tests/test_basic | |||
# shellcheck disable=SC2086 | |||
bazel test --test_output=streamed --config=ci --test_env=RAY_MINIMAL=1 --test_env=TEST_EXTERNAL_REDIS=1 ${BAZEL_EXPORT_OPTIONS} python/ray/tests/test_basic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is external Redis worth a separate test function and build step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's probably is ok because redis is also what we should support in minimal install. We only run basic test which does't require that much time.
assert len(cur_config_list) == 12 | ||
cur_config_list[8:] = ["pubsub", "134217728", "134217728", "60"] | ||
redis_client.config_set("client-output-buffer-limit", " ".join(cur_config_list)) | ||
# Put a time stamp in Redis to indicate when it was started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep setting"redis_start_time"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redis is provided by the user, not Ray, so it doesn't make sense for ray to write anything like this into db.
python/ray/node.py
Outdated
@@ -1086,9 +1044,7 @@ def start_head_processes(self): | |||
# Redis, when external Redis address is specified. | |||
# TODO(mwtian): after GCS bootstrapping is default and stable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO can be removed now.
python/ray/node.py
Outdated
@@ -96,10 +96,6 @@ def __init__( | |||
if len(external_redis) == 1: | |||
external_redis.append(external_redis[0]) | |||
[primary_redis_ip, port] = external_redis[0].split(":") | |||
ray._private.services.wait_for_redis_to_start( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove wait_for_redis_to_start()
here and in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
The user should provide a Redis cluster and make sure it's up, so this logic doesn't fit here.
-
We have retry in gcs&raylet, so it should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the remaining usages of wait_for_redis_to_start()
, and can they be moved to a test-only util or removed as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel like it is still good to have some health checking before it starts. Does Redis have any API for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @rkooo567. I didn't find the right way to phrase it earlier, but I think giving a good error message if Redis is unavailable is important. Having a dedicated "wait and check Redis connection" logic seems the easiest way to achieve that. Maybe we don't need all the "wait for Redis connection" calls though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all the logic related to the storage are moved to GCS, we probably should just pass that information from GCS to the driver.
I don't see any benefit of detecting it here in the python script especially given that it'll need the user to pip install redis
to use this.
python/ray/node.py
Outdated
@@ -96,10 +96,6 @@ def __init__( | |||
if len(external_redis) == 1: | |||
external_redis.append(external_redis[0]) | |||
[primary_redis_ip, port] = external_redis[0].split(":") | |||
ray._private.services.wait_for_redis_to_start( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the remaining usages of wait_for_redis_to_start()
, and can they be moved to a test-only util or removed as well?
python/ray/node.py
Outdated
@@ -96,10 +96,6 @@ def __init__( | |||
if len(external_redis) == 1: | |||
external_redis.append(external_redis[0]) | |||
[primary_redis_ip, port] = external_redis[0].split(":") | |||
ray._private.services.wait_for_redis_to_start( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel like it is still good to have some health checking before it starts. Does Redis have any API for that?
redis_client.config_set("client-output-buffer-limit", " ".join(cur_config_list)) | ||
# Put a time stamp in Redis to indicate when it was started. | ||
redis_client.set("redis_start_time", time.time()) | ||
# Construct the command to start the Redis server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this method at all? Why don't we move it to test utils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should. Let me update the PR.
@rkooo567 @mwtian I agree with what you said and in high-level I think what we should do is that pop up useful messages when something is not working. But this kind of message shouldn't be put into python layer given that ray doesn't depend on redis and it'll require the user to @rkooo567 right now if redis failed to start, GCS will crash and pop some error like redis failed to start, for case like this (component crash), do we have a way to expose this message via observability features you built to the user? |
Discuss it with @rkooo567
Besides this, I'll also check whether it's possible to push message from GCS to core worker before GCS crashed (stretch goal) for better observability. |
Signed-off-by: Yi Cheng <[email protected]>
…ect#25875) Signed-off-by: Yi Cheng <[email protected]> ## Why are these changes needed? Right now, only cpp layer in ray is connecting to redis which means we don't need pip redis to connect to a redis db. The blocking part is that we are doing some sharding in redis right now. But this feature is not actually used and the shard is always 1. So to make things simple, this feature is just disabled. Test is added to make sure we can start ray with a redis db without pip redis. Signed-off-by: Rohan138 <[email protected]>
…ect#25875) Signed-off-by: Yi Cheng <[email protected]> ## Why are these changes needed? Right now, only cpp layer in ray is connecting to redis which means we don't need pip redis to connect to a redis db. The blocking part is that we are doing some sharding in redis right now. But this feature is not actually used and the shard is always 1. So to make things simple, this feature is just disabled. Test is added to make sure we can start ray with a redis db without pip redis. Signed-off-by: Stefan van der Kleij <[email protected]>
Why are these changes needed?
Right now, only cpp layer in ray is connecting to redis which means we don't need pip redis to connect to a redis db.
The blocking part is that we are doing some sharding in redis right now. But this feature is not actually used and the shard is always 1. So to make things simple, this feature is just disabled.
Test is added to make sure we can start ray with a redis db without pip redis.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.