Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Make ray able to connect to redis without pip redis. #25875

Merged
merged 1 commit into from
Jul 24, 2022

Conversation

fishbone
Copy link
Contributor

Why are these changes needed?

Right now, only cpp layer in ray is connecting to redis which means we don't need pip redis to connect to a redis db.

The blocking part is that we are doing some sharding in redis right now. But this feature is not actually used and the shard is always 1. So to make things simple, this feature is just disabled.

Test is added to make sure we can start ray with a redis db without pip redis.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@fishbone
Copy link
Contributor Author

@wilsonwang371 Once we get this PR in, we don't need pip redis in KubeRay.

@fishbone
Copy link
Contributor Author

I'll remove enable_sharding_conn_ in next PR.

@ericl ericl removed their assignment Jun 17, 2022
@@ -751,14 +751,24 @@ test_minimal() {
# shellcheck disable=SC2086
bazel test --test_output=streamed --config=ci --test_env=RAY_MINIMAL=1 ${BAZEL_EXPORT_OPTIONS} python/ray/tests/test_basic
# shellcheck disable=SC2086
bazel test --test_output=streamed --config=ci --test_env=RAY_MINIMAL=1 --test_env=TEST_EXTERNAL_REDIS=1 ${BAZEL_EXPORT_OPTIONS} python/ray/tests/test_basic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is external Redis worth a separate test function and build step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's probably is ok because redis is also what we should support in minimal install. We only run basic test which does't require that much time.

assert len(cur_config_list) == 12
cur_config_list[8:] = ["pubsub", "134217728", "134217728", "60"]
redis_client.config_set("client-output-buffer-limit", " ".join(cur_config_list))
# Put a time stamp in Redis to indicate when it was started.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep setting"redis_start_time"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redis is provided by the user, not Ray, so it doesn't make sense for ray to write anything like this into db.

@@ -1086,9 +1044,7 @@ def start_head_processes(self):
# Redis, when external Redis address is specified.
# TODO(mwtian): after GCS bootstrapping is default and stable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO can be removed now.

@@ -96,10 +96,6 @@ def __init__(
if len(external_redis) == 1:
external_redis.append(external_redis[0])
[primary_redis_ip, port] = external_redis[0].split(":")
ray._private.services.wait_for_redis_to_start(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove wait_for_redis_to_start() here and in other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The user should provide a Redis cluster and make sure it's up, so this logic doesn't fit here.

  2. We have retry in gcs&raylet, so it should be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the remaining usages of wait_for_redis_to_start(), and can they be moved to a test-only util or removed as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like it is still good to have some health checking before it starts. Does Redis have any API for that?

Copy link
Member

@mwtian mwtian Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @rkooo567. I didn't find the right way to phrase it earlier, but I think giving a good error message if Redis is unavailable is important. Having a dedicated "wait and check Redis connection" logic seems the easiest way to achieve that. Maybe we don't need all the "wait for Redis connection" calls though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the logic related to the storage are moved to GCS, we probably should just pass that information from GCS to the driver.

I don't see any benefit of detecting it here in the python script especially given that it'll need the user to pip install redis to use this.

@fishbone
Copy link
Contributor Author

Any other comments? @rkooo567 @scv119 @mwtian

@@ -96,10 +96,6 @@ def __init__(
if len(external_redis) == 1:
external_redis.append(external_redis[0])
[primary_redis_ip, port] = external_redis[0].split(":")
ray._private.services.wait_for_redis_to_start(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the remaining usages of wait_for_redis_to_start(), and can they be moved to a test-only util or removed as well?

@@ -96,10 +96,6 @@ def __init__(
if len(external_redis) == 1:
external_redis.append(external_redis[0])
[primary_redis_ip, port] = external_redis[0].split(":")
ray._private.services.wait_for_redis_to_start(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like it is still good to have some health checking before it starts. Does Redis have any API for that?

redis_client.config_set("client-output-buffer-limit", " ".join(cur_config_list))
# Put a time stamp in Redis to indicate when it was started.
redis_client.set("redis_start_time", time.time())
# Construct the command to start the Redis server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method at all? Why don't we move it to test utils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. Let me update the PR.

@fishbone
Copy link
Contributor Author

fishbone commented Jun 21, 2022

@rkooo567 @mwtian I agree with what you said and in high-level I think what we should do is that pop up useful messages when something is not working.

But this kind of message shouldn't be put into python layer given that ray doesn't depend on redis and it'll require the user to pip install redis to use this and for the user's image, sometimes it's rebuilt and this will require them to update their environment to use this redis storage which is very inconvenient given that this storage is supported by ray right now (not through plugin mode).
I'm also not comfortable to put it into core worker given that core worker (cpp) also doesn't depend on this.

@rkooo567 right now if redis failed to start, GCS will crash and pop some error like redis failed to start, for case like this (component crash), do we have a way to expose this message via observability features you built to the user?

@fishbone fishbone added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 21, 2022
@fishbone
Copy link
Contributor Author

Discuss it with @rkooo567
I'll do

  1. add log in driver about failed to connect to GCS and ask the user to check GCS log for details
  2. print detailed message about GCS failed to reach redis

Besides this, I'll also check whether it's possible to push message from GCS to core worker before GCS crashed (stretch goal) for better observability.

@fishbone fishbone added the do-not-merge Do not merge this PR! label Jun 22, 2022
Signed-off-by: Yi Cheng <[email protected]>
@fishbone fishbone removed the do-not-merge Do not merge this PR! label Jul 24, 2022
@fishbone fishbone merged commit 0c16619 into ray-project:master Jul 24, 2022
Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022
…ect#25875)

Signed-off-by: Yi Cheng <[email protected]>

## Why are these changes needed?
Right now, only cpp layer in ray is connecting to redis which means we don't need pip redis to connect to a redis db.

The blocking part is that we are doing some sharding in redis right now. But this feature is not actually used and the shard is always 1. So to make things simple, this feature is just disabled.

Test is added to make sure we can start ray with a redis db without pip redis. 

Signed-off-by: Rohan138 <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
…ect#25875)

Signed-off-by: Yi Cheng <[email protected]>

## Why are these changes needed?
Right now, only cpp layer in ray is connecting to redis which means we don't need pip redis to connect to a redis db.

The blocking part is that we are doing some sharding in redis right now. But this feature is not actually used and the shard is always 1. So to make things simple, this feature is just disabled.

Test is added to make sure we can start ray with a redis db without pip redis.

Signed-off-by: Stefan van der Kleij <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants