[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513

sven1977 · 2022-10-20T17:30:22Z

PolicyMap LRU cache enhancements:

A new AlgorithmConfig.multi_agent(policies_swappable=True) setting allows "state-swapping out" policies (instead of least used one being GC'd and recreated when used again). Swapping works very simple as: s = A.get_state(); B.set_state(s), where A and B are policies in the map.
To stash away policies that have been least recently used (given that we are at capacity), we now use Ray object store (instead of the file system previously) as caching mechanism. The object store serves as additional (reserved) memory and already has a spillover mechanism (to file system) that is used in case we run out of memory.

This should allow for a much faster (~15-20x) policy caching.

3 new test cases have been added:

Swapping mechanism (policies_swappable=True) faster than non-swapping (recreating policies from LRU cache).
Swapping works on GPU as well.
Learning test with 1000(!) policies (100 of which are trainable).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

# Conflicts: # rllib/evaluation/rollout_worker.py

Signed-off-by: sven1977 <[email protected]>

…cy_map_lru_cache_enhancements Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/algorithm_config.py # rllib/evaluation/rollout_worker.py # rllib/evaluation/worker_set.py # rllib/policy/policy_map.py # rllib/utils/tf_utils.py

Signed-off-by: sven1977 <[email protected]>

…cy_map_lru_cache_enhancements Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/utils/tf_utils.py

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2022-11-01T12:06:00Z

rllib/algorithms/algorithm_config.py

-        policies_to_train=None,
-        observation_fn=None,
-        count_steps_by=None,
+        policy_map_capacity: Optional[int] = None,


Fixed docstr. Deprecated policy_map_cache arg.

sven1977 · 2022-11-01T12:08:51Z

rllib/policy/tests/test_policy_map.py

+TIME_SWAPS: Optional[float] = None
+
+
+class TestPolicyMap(unittest.TestCase):


New test case checking, whether swapping (a = A.get_state() -> B.set_state(a) -> B becomes A) vs no-swapping (re-creating of policies) is much faster.

sven1977 · 2022-11-01T12:09:17Z

rllib/policy/tests/test_policy_state_swapping.py

+from ray.rllib.utils.tf_utils import get_tf_eager_cls_if_necessary
+
+
+class TestPolicyStateSwapping(unittest.TestCase):


New Test proving that state swaps work as expected, even with GPU policies.

sven1977 · 2022-11-01T12:09:56Z

rllib/utils/tf_utils.py

@@ -247,6 +250,8 @@ class for.
        raise ImportError("Could not import tensorflow!")

    if framework == "tf2":
+        if not tf1.executing_eagerly():


Add this here in case we are using this utility function outside of "regular" RLlib classes (such as RolloutWorker, which handles this itself).

this likely is not gonna work in a util function.
enable_eager_execution() needs to be called before any TF operations, and throws exceptions otherwise.
this usually needs to be enabled in main().

True, but then it would fail on the next line anyways, which asserts that eager is enabled :)
I added this, such that this util function can be more safely used in test cases, where we switch between eager and traced (and graph) execution to prove that something works for all frameworks.

sven1977 · 2022-11-01T12:10:59Z

rllib/policy/policy_map.py

@@ -34,29 +24,25 @@ class PolicyMap(dict):

    def __init__(
        self,
-        worker_index: int,
-        num_workers: int,
+        *,


Simplified PolicyMap a lot, making it a very thin wrapper around a dict and removing the "heavy" add_policy/create_policy APIs. Users should create policies outside the PolicyMap and just use

policy_obj = ctor(a=.., b=.., c=..) policy_map[some_id] = policy_obj

instead.

Signed-off-by: sven1977 <[email protected]>

…cy_map_lru_cache_enhancements

Signed-off-by: sven1977 <[email protected]>

ArturNiederfahrenhorst · 2022-11-30T07:18:30Z

rllib/policy/dynamic_tf_policy_v2.py

@@ -660,6 +660,7 @@ def _init_optimizers(self):
    def maybe_initialize_optimizer_and_loss(self):
        # We don't need to initialize loss calculation for MultiGPUTowerStack.
        if self._is_tower:
+            self.get_session().run(tf1.global_variables_initializer())


Why do we need to call Session.run() during policy init? / Why did we not need to do this before?

ArturNiederfahrenhorst · 2022-11-30T07:34:13Z

rllib/tuned_examples/appo/multi-agent-cartpole-w-1000-policies-appo.py

+    # to succeed (to speed things up a little; some trainable policies may receive
+    # more or less data and may thus learn more or less quickly).
+    f"policy_reward_mean/pol{i}": 50.0 for i in range(num_trainable)
+}, **{"timesteps_total": 400000})


Where is the actual execution of these configs?

Looks to me like these configs are only set up, but nothing is tested.

Ah, great question. Meet our new and improved yaml-replacement config files, purely written in python and consumable by the RLlib CLI and our CI learning script (tests/run_regression_test.py)
:)

You only need to define a config var and - optionally - a stop var in those scripts, then you can pass them on the command line to the respective script.

Wow that's awesome! I lived under a stone for a couple of days

ArturNiederfahrenhorst

The only "real" thing I want to add to Kourosh's review is that multi-agent-cartpole-w-1000-policies-appo.py seems to be missing some things! The rest is not blocking 👍

ArturNiederfahrenhorst

Sorry, meant to comment instead of approve!

sven1977 · 2022-11-30T11:18:41Z

Thanks a ton for your reviews @kouroshHakha and @ArturNiederfahrenhorst !
All problems have been addressed and all questions answered. Please take another look.

Signed-off-by: sven1977 <[email protected]>

…cy_map_lru_cache_enhancements

Signed-off-by: sven1977 <[email protected]>

…cy_map_lru_cache_enhancements

Signed-off-by: sven1977 <[email protected]>

kouroshHakha

Best PR of the week :)

ArturNiederfahrenhorst

Thanks! :)

stephanie-wang · 2022-12-01T02:04:34Z

Hey @sven1977, this PR appears to add a flaky test, tracked here. Can you revert this PR or skip the test?

…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)

…instead of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)" This reverts commit ed3f3c0.

…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513) Signed-off-by: Weichen Xu <[email protected]>

…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513) Signed-off-by: tmynn <[email protected]>

sven1977 added 6 commits September 28, 2022 19:03

wip

b59d74e

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' into policy_map_lru_cache_enhancements

76b3c1c

# Conflicts: # rllib/evaluation/rollout_worker.py

wip

d3c2214

Signed-off-by: sven1977 <[email protected]>

wip

e9f2901

Signed-off-by: sven1977 <[email protected]>

wip

3f256d4

Signed-off-by: sven1977 <[email protected]>

wip

ea12a2e

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners October 20, 2022 17:30

sven1977 changed the title ~~[WIP; RLlib] Policy map LRU cache enhancements.~~ [RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). Oct 27, 2022

sven1977 added 5 commits October 27, 2022 15:35

wip

9c1be5b

Signed-off-by: sven1977 <[email protected]>

LINT

e5def7d

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into poli…

be5c5bb

…cy_map_lru_cache_enhancements Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/utils/tf_utils.py

wip

3c77434

Signed-off-by: sven1977 <[email protected]>

wip

763b150

Signed-off-by: sven1977 <[email protected]>

sven1977 assigned ArturNiederfahrenhorst and kouroshHakha Nov 1, 2022

sven1977 commented Nov 1, 2022

View reviewed changes

sven1977 added 3 commits November 1, 2022 13:13

wip

ec6dbed

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into poli…

a42b7df

…cy_map_lru_cache_enhancements

wip

492ca66

Signed-off-by: sven1977 <[email protected]>

ArturNiederfahrenhorst reviewed Nov 30, 2022

View reviewed changes

ArturNiederfahrenhorst approved these changes Nov 30, 2022

View reviewed changes

ArturNiederfahrenhorst reviewed Nov 30, 2022

View reviewed changes

sven1977 added 11 commits November 30, 2022 12:35

wip

3a37b97

Signed-off-by: sven1977 <[email protected]>

wip

0979b53

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into poli…

8dbc53e

…cy_map_lru_cache_enhancements

wip

d16d855

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into poli…

7125de0

…cy_map_lru_cache_enhancements

wip

9cfd910

Signed-off-by: sven1977 <[email protected]>

wip

7eed7c0

Signed-off-by: sven1977 <[email protected]>

wip

3efb710

Signed-off-by: sven1977 <[email protected]>

wip

e52d67b

Signed-off-by: sven1977 <[email protected]>

wip

e39237d

Signed-off-by: sven1977 <[email protected]>

wip

e5d1362

Signed-off-by: sven1977 <[email protected]>

kouroshHakha approved these changes Nov 30, 2022

View reviewed changes

ArturNiederfahrenhorst approved these changes Nov 30, 2022

View reviewed changes

sven1977 merged commit ed3f3c0 into ray-project:master Nov 30, 2022

jeicher pushed a commit to tweag/ray that referenced this pull request Dec 1, 2022

[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead …

516b736

…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513

[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513

sven1977 commented Oct 20, 2022 •

edited

Loading

sven1977 Nov 1, 2022

sven1977 Nov 1, 2022

sven1977 Nov 1, 2022

sven1977 Nov 1, 2022

gjoliver Nov 1, 2022

sven1977 Nov 3, 2022

sven1977 Nov 1, 2022

ArturNiederfahrenhorst Nov 1, 2022

ArturNiederfahrenhorst Nov 30, 2022

ArturNiederfahrenhorst Nov 30, 2022

ArturNiederfahrenhorst Nov 30, 2022

sven1977 Nov 30, 2022

ArturNiederfahrenhorst Nov 30, 2022

ArturNiederfahrenhorst left a comment

ArturNiederfahrenhorst left a comment

sven1977 commented Nov 30, 2022

kouroshHakha left a comment

ArturNiederfahrenhorst left a comment

stephanie-wang commented Dec 1, 2022

		TIME_SWAPS: Optional[float] = None


		class TestPolicyMap(unittest.TestCase):

		from ray.rllib.utils.tf_utils import get_tf_eager_cls_if_necessary


		class TestPolicyStateSwapping(unittest.TestCase):

[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513

[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513

Conversation

sven1977 commented Oct 20, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

sven1977 commented Nov 30, 2022

kouroshHakha left a comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

stephanie-wang commented Dec 1, 2022

sven1977 commented Oct 20, 2022 •

edited

Loading