Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 20, 2022

PolicyMap LRU cache enhancements:

  • A new AlgorithmConfig.multi_agent(policies_swappable=True) setting allows "state-swapping out" policies (instead of least used one being GC'd and recreated when used again). Swapping works very simple as: s = A.get_state(); B.set_state(s), where A and B are policies in the map.
  • To stash away policies that have been least recently used (given that we are at capacity), we now use Ray object store (instead of the file system previously) as caching mechanism. The object store serves as additional (reserved) memory and already has a spillover mechanism (to file system) that is used in case we run out of memory.

This should allow for a much faster (~15-20x) policy caching.

3 new test cases have been added:

  • Swapping mechanism (policies_swappable=True) faster than non-swapping (recreating policies from LRU cache).
  • Swapping works on GPU as well.
  • Learning test with 1000(!) policies (100 of which are trainable).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
# Conflicts:
#	rllib/evaluation/rollout_worker.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…cy_map_lru_cache_enhancements

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	rllib/algorithms/algorithm_config.py
#	rllib/evaluation/rollout_worker.py
#	rllib/evaluation/worker_set.py
#	rllib/policy/policy_map.py
#	rllib/utils/tf_utils.py
@sven1977 sven1977 changed the title [WIP; RLlib] Policy map LRU cache enhancements. [RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). Oct 27, 2022
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…cy_map_lru_cache_enhancements

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	rllib/utils/tf_utils.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
policies_to_train=None,
observation_fn=None,
count_steps_by=None,
policy_map_capacity: Optional[int] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed docstr. Deprecated policy_map_cache arg.

TIME_SWAPS: Optional[float] = None


class TestPolicyMap(unittest.TestCase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test case checking, whether swapping (a = A.get_state() -> B.set_state(a) -> B becomes A) vs no-swapping (re-creating of policies) is much faster.

from ray.rllib.utils.tf_utils import get_tf_eager_cls_if_necessary


class TestPolicyStateSwapping(unittest.TestCase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Test proving that state swaps work as expected, even with GPU policies.

@@ -247,6 +250,8 @@ class for.
raise ImportError("Could not import tensorflow!")

if framework == "tf2":
if not tf1.executing_eagerly():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this here in case we are using this utility function outside of "regular" RLlib classes (such as RolloutWorker, which handles this itself).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this likely is not gonna work in a util function.
enable_eager_execution() needs to be called before any TF operations, and throws exceptions otherwise.
this usually needs to be enabled in main().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but then it would fail on the next line anyways, which asserts that eager is enabled :)
I added this, such that this util function can be more safely used in test cases, where we switch between eager and traced (and graph) execution to prove that something works for all frameworks.

@@ -34,29 +24,25 @@ class PolicyMap(dict):

def __init__(
self,
worker_index: int,
num_workers: int,
*,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified PolicyMap a lot, making it a very thin wrapper around a dict and removing the "heavy" add_policy/create_policy APIs. Users should create policies outside the PolicyMap and just use

policy_obj = ctor(a=.., b=.., c=..)
policy_map[some_id] = policy_obj

instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@@ -660,6 +660,7 @@ def _init_optimizers(self):
def maybe_initialize_optimizer_and_loss(self):
# We don't need to initialize loss calculation for MultiGPUTowerStack.
if self._is_tower:
self.get_session().run(tf1.global_variables_initializer())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call Session.run() during policy init? / Why did we not need to do this before?

# to succeed (to speed things up a little; some trainable policies may receive
# more or less data and may thus learn more or less quickly).
f"policy_reward_mean/pol{i}": 50.0 for i in range(num_trainable)
}, **{"timesteps_total": 400000})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the actual execution of these configs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like these configs are only set up, but nothing is tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great question. Meet our new and improved yaml-replacement config files, purely written in python and consumable by the RLlib CLI and our CI learning script (tests/run_regression_test.py)
:)

You only need to define a config var and - optionally - a stop var in those scripts, then you can pass them on the command line to the respective script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's awesome! I lived under a stone for a couple of days

Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "real" thing I want to add to Kourosh's review is that multi-agent-cartpole-w-1000-policies-appo.py seems to be missing some things! The rest is not blocking 👍

Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, meant to comment instead of approve!

@sven1977
Copy link
Contributor Author

Thanks a ton for your reviews @kouroshHakha and @ArturNiederfahrenhorst !
All problems have been addressed and all questions answered. Please take another look.

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best PR of the week :)

Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :)

@sven1977 sven1977 merged commit ed3f3c0 into ray-project:master Nov 30, 2022
@stephanie-wang
Copy link
Contributor

Hey @sven1977, this PR appears to add a flaky test, tracked here. Can you revert this PR or skip the test?

jeicher pushed a commit to tweag/ray that referenced this pull request Dec 1, 2022
…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)
avnishn added a commit to avnishn/ray that referenced this pull request Dec 1, 2022
…instead of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)"

This reverts commit ed3f3c0.
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)

Signed-off-by: Weichen Xu <[email protected]>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)

Signed-off-by: tmynn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants