-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug][RLlib] Gym environment registration does not work when using Ray Client and ray.init #21734
Comments
@ericl |
@mwtian See above. Can you help clarify the behavior of gcs client or point me to someone? |
If this is about gcs kv client (for get / put etc), @iycheng will be the most knowledgeable. Thanks for making the fix, and feel free to assign both of us to the PR! |
For |
I'm unable to produce this bug. @xwjiang2010 did you produce a fix for this, and can this issue be closed? |
@mwtian Thanks for the response. Minimal reproduce:
|
@xwjiang2010 , just to make sure, @iycheng, do you want to take a look? |
@mwtian that's my assumption about gcs client protocol. Maybe @iycheng can clarify? |
@mwtian @iycheng Do you have any update for this? It seems we have met the same issue in our application. |
This is a P0 issue from our side. @ericl CC |
@jovany-wang just to confirm, you are receiving empty bytes when calling |
@mwtian I believe it's totally the same issue according to my stack: ---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
/tmp/ipykernel_4689/1080049057.py in <module>
47
48 ray.client('100.88.148.29:38159').connect()
---> 49 main()
/tmp/ipykernel_4689/1080049057.py in main()
33
34 # Create our RLlib Trainer.
---> 35 trainer = PPOTrainer(config=config)
36
37 # Run it for n training iterations. A training iteration includes
~/.local/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py in __init__(self, config, env, logger_creator)
121
122 def __init__(self, config=None, env=None, logger_creator=None):
--> 123 Trainer.__init__(self, config, env, logger_creator)
124
125 def _init(self, config: TrainerConfigDict,
~/.local/lib/python3.7/site-packages/ray/rllib/agents/trainer.py in __init__(self, config, env, logger_creator)
546 logger_creator = default_logger_creator
547
--> 548 super().__init__(config, logger_creator)
549
550 @classmethod
~/.local/lib/python3.7/site-packages/ray/tune/trainable.py in __init__(self, config, logger_creator)
96
97 start_time = time.time()
---> 98 self.setup(copy.deepcopy(self.config))
99 setup_time = time.time() - start_time
100 if setup_time > SETUP_TIME_THRESHOLD:
~/.local/lib/python3.7/site-packages/ray/rllib/agents/trainer.py in setup(self, config)
640 # An already registered env.
641 if _global_registry.contains(ENV_CREATOR, env):
--> 642 self.env_creator = _global_registry.get(ENV_CREATOR, env)
643 # A class specifier.
644 elif "." in env:
~/.local/lib/python3.7/site-packages/ray/tune/registry.py in get(self, category, key)
138 "Registry value for {}/{} doesn't exist.".format(
139 category, key))
--> 140 return pickle.loads(value)
141 else:
142 return pickle.loads(self._to_flush[(category, key)])
EOFError: Ran out of input |
@mwtian FYI, we are using 1.4 or 1.2 I believe |
Sorry, it still uses def get(self, category, key):
if _internal_kv_initialized():
value = _internal_kv_get(_make_key(category, key))
if value is None:
raise ValueError(
"Registry value for {}/{} doesn't exist.".format(
category, key))
return pickle.loads(value) |
Will try to take a look tomorrow. Btw the fix will very unlikely get back ported. |
@mwtian Do we have any update? |
Let's see if #24058 can fix the issue. |
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
When using RLlib and Ray Client then you will receive an error (see below) when relying on:
ray.init(f"ray://127.0.0.1:10001")
whereas things work when using:
export RAY_ADDRESS="ray://127.0.0.1:10001"
In particular this error only happens when using the default gym registered strings. When using a custom registration then code runs as expected.
So:
Versions / Dependencies
Ray 1.10.0-py38 Docker image with TensorFlow installed.
Reproduction script
Anything else
Happens always.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: