Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rllib] Add timeout to filter synchronization #25959

Merged

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

Why are these changes needed?

The filters that Rollout Workers apply to metrics are updated regularely as part of the training_step().
This operation includes a ray.get() that is lacking a timeout.
Without knowing the specifics of why this times out on my local machine when running IMPALA, I propose to set a timeout here so that we do not get stuck on this unnoticed.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@gjoliver
Copy link
Member

oh, I was gonna use connectors to implement this filters ... seems like extra complexity if they need to be synced.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just the one not about int-> Optional float, then we can merge.

@sven1977 sven1977 merged commit bed9083 into ray-project:master Jun 24, 2022
@sven1977
Copy link
Contributor

Hey @gjoliver , yeah filters may still have some features (synching) that we don't support with connectors yet. But I'm not too sure it's really that important. Even if you use something like MeanStdFilter and workers don't sync, you'd probably still be able to learn properly given that the distribution of states that each worker's trajectory covers is somewhat uniform between the workers.
We could run some experiments to find out.

truelegion47 pushed a commit to truelegion47/ray that referenced this pull request Jun 30, 2022
* master: (35 commits)
  [tune/structure] Refactor `suggest` into `search` package (ray-project#26074)
  Add back ray.state in deprecation wrapper; print stack trace on warning (ray-project#26086)
  Enable isort for base directory (ray-project#26085)
  [AIR] Add __init__.py to ray.air.callbacks (ray-project#26088)
  [Serve] [Docs] Create end-to-end documentation example for Serve REST API and CLI (ray-project#25936)
  [AIR] Remove unnecessary pandas from examples (ray-project#26009)
  [Datasets] [Hotfix] Update `ds.to_pandas()` limit error to reflect current limit API (ray-project#26081)
  [Serve] [Docs] Add Serve REST API Schema to Serve API Docs (ray-project#25786)
  [Core][Doc] remove cython section from advanced doc. ray-project#26062
  [Core] Fix check failure from incorrect death cause (ray-project#26007)
  [hotfix] Fix linkcheck (ray-project#26070)
  [RLlib] Add timeout to filter synchronization. (ray-project#25959)
  [tune/structure] Introduce logger package (ray-project#26049)
  [RLlib] introduce serialization for our custom gym space types. (ray-project#25923)
  Fix unit test test_check_env.py and est_check_multi_agent.py. (ray-project#25993)
  [RLlib] Make QMix use the ReplayBufferAPI (ray-project#25560)
  [CI] deflake test_multi_node_3 by increasing its timeout
  [CI] Use BUILDKITE_JOB_ID for better navigation for flaky tracker (ray-project#26021)
  [AIR/Docs] Improve user guide gallery (ray-project#26016)
  🎨 Update type annotations to include options in `ray.remote()` (ray-project#25999)
  ...
@ArturNiederfahrenhorst ArturNiederfahrenhorst deleted the filtersynchronization branch September 21, 2022 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants