-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; RELEASE BLOCKER] Fix Policy server/client (currently broken and not caught by tests!) #30526
[RLlib; RELEASE BLOCKER] Fix Policy server/client (currently broken and not caught by tests!) #30526
Conversation
Signed-off-by: sven1977 <[email protected]>
…cy_server_enhancements
…cy_server_enhancements
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…cy_server_enhancements
kwargs["config"] = kwargs["config"].copy(copy_frozen=False) | ||
config = kwargs["config"] | ||
config.output = None | ||
config.input_ = "sampler" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Was thinking that users might need access to other input types but I guess why would anyone use policy_client if they weren't doing environment sampling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that, too. But also note this is only for the extra RolloutWorker that we create on either the client (inference-mode=local) or on the server (inside the PolicyServerInput object!) for inference mode=remote.
The whole design is completely flawed imo, but we'll have to fix this separately, it's beyond the scope of this quick fix PR. This PR does NOT touch the original (bad) design.
rllib/env/policy_server_input.py
Outdated
@@ -64,7 +65,7 @@ class PolicyServerInput(ThreadingMixIn, HTTPServer, InputReader): | |||
""" | |||
|
|||
@PublicAPI | |||
def __init__(self, ioctx, address, port, idle_timeout=3.0): | |||
def __init__(self, ioctx, address, port, idle_timeout=3.0, use_json=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you've added this as an option, but is this used at any point now in the tests/examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will remove.
Leftover from my other work, which made me discover this bug in the first place :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
you mention that "The current tests don't even notice if one or more client processes crash due to a failed connection/failed server." but I'm not sure how the tests as they are rebuffed handle this issue. |
…cy_server_enhancements
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a couple of questions
rllib/env/policy_server_input.py
Outdated
@@ -64,7 +65,7 @@ class PolicyServerInput(ThreadingMixIn, HTTPServer, InputReader): | |||
""" | |||
|
|||
@PublicAPI | |||
def __init__(self, ioctx, address, port, idle_timeout=3.0): | |||
def __init__(self, ioctx, address, port, idle_timeout=3.0, use_json=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
rllib/env/policy_server_input.py
Outdated
setup_child_rollout_worker() | ||
assert inference_thread.is_alive() | ||
response["episode_id"] = child_rollout_worker.env.start_episode( | ||
args["episode_id"], args["training_enabled"] | ||
) | ||
elif command == Commands.GET_ACTION: | ||
elif ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very hacky :). Isn't command and Commands.GET_ACTION always string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use StrEnum instead?
@@ -457,6 +455,8 @@ def __init__( | |||
global _global_worker | |||
_global_worker = self | |||
|
|||
from ray.rllib.algorithms.algorithm_config import AlgorithmConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to kill the cyclic dependency chain after release to avoid these types of imports.
@@ -457,6 +455,8 @@ def __init__( | |||
global _global_worker | |||
_global_worker = self | |||
|
|||
from ray.rllib.algorithms.algorithm_config import AlgorithmConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to kill the cyclic dependency chain after release to avoid these types of imports.
|
||
if args.as_test: | ||
print("Checking if learning goals were achieved") | ||
check_learning_achieved(results, args.stop_reward) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the main fix for the unittest? check whether learning of min_reward_trheshold was achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a couple of questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved.
…nd not caught by tests) (ray-project#30526) Signed-off-by: Weichen Xu <[email protected]>
Fix Policy server/client (currently broken and not caught by tests!)
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.