Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; RELEASE BLOCKER] Fix Policy server/client (currently broken and not caught by tests!) #30526

Merged
merged 10 commits into from
Nov 21, 2022

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 21, 2022

Fix Policy server/client (currently broken and not caught by tests!)

  • Fixes broken local inference RolloutWorker creation on a) client side (mode "local") or b) server side (mode "remote").
  • Enhances test cases to test for learning success. The current tests don't even notice if one or more client processes crash due to a failed connection/failed server.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@zhe-thoughts zhe-thoughts added release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order labels Nov 21, 2022
kwargs["config"] = kwargs["config"].copy(copy_frozen=False)
config = kwargs["config"]
config.output = None
config.input_ = "sampler"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Was thinking that users might need access to other input types but I guess why would anyone use policy_client if they weren't doing environment sampling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that, too. But also note this is only for the extra RolloutWorker that we create on either the client (inference-mode=local) or on the server (inside the PolicyServerInput object!) for inference mode=remote.
The whole design is completely flawed imo, but we'll have to fix this separately, it's beyond the scope of this quick fix PR. This PR does NOT touch the original (bad) design.

@@ -64,7 +65,7 @@ class PolicyServerInput(ThreadingMixIn, HTTPServer, InputReader):
"""

@PublicAPI
def __init__(self, ioctx, address, port, idle_timeout=3.0):
def __init__(self, ioctx, address, port, idle_timeout=3.0, use_json=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've added this as an option, but is this used at any point now in the tests/examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will remove.
Leftover from my other work, which made me discover this bug in the first place :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@avnishn
Copy link
Member

avnishn commented Nov 21, 2022

you mention that "The current tests don't even notice if one or more client processes crash due to a failed connection/failed server."

but I'm not sure how the tests as they are rebuffed handle this issue.
Can you comment on the lines where this is handled in your pr?

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of questions

@@ -64,7 +65,7 @@ class PolicyServerInput(ThreadingMixIn, HTTPServer, InputReader):
"""

@PublicAPI
def __init__(self, ioctx, address, port, idle_timeout=3.0):
def __init__(self, ioctx, address, port, idle_timeout=3.0, use_json=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

setup_child_rollout_worker()
assert inference_thread.is_alive()
response["episode_id"] = child_rollout_worker.env.start_episode(
args["episode_id"], args["training_enabled"]
)
elif command == Commands.GET_ACTION:
elif (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very hacky :). Isn't command and Commands.GET_ACTION always string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use StrEnum instead?

@@ -457,6 +455,8 @@ def __init__(
global _global_worker
_global_worker = self

from ray.rllib.algorithms.algorithm_config import AlgorithmConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to kill the cyclic dependency chain after release to avoid these types of imports.

@@ -457,6 +455,8 @@ def __init__(
global _global_worker
_global_worker = self

from ray.rllib.algorithms.algorithm_config import AlgorithmConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to kill the cyclic dependency chain after release to avoid these types of imports.


if args.as_test:
print("Checking if learning goals were achieved")
check_learning_achieved(results, args.stop_reward)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the main fix for the unittest? check whether learning of min_reward_trheshold was achieved?

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of questions

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved.

@kouroshHakha kouroshHakha added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 21, 2022
@sven1977 sven1977 merged commit ba18004 into ray-project:master Nov 21, 2022
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants