-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513
Conversation
Signed-off-by: sven1977 <[email protected]>
# Conflicts: # rllib/evaluation/rollout_worker.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…cy_map_lru_cache_enhancements Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/algorithm_config.py # rllib/evaluation/rollout_worker.py # rllib/evaluation/worker_set.py # rllib/policy/policy_map.py # rllib/utils/tf_utils.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…cy_map_lru_cache_enhancements Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/utils/tf_utils.py
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
rllib/algorithms/algorithm_config.py
Outdated
policies_to_train=None, | ||
observation_fn=None, | ||
count_steps_by=None, | ||
policy_map_capacity: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed docstr. Deprecated policy_map_cache
arg.
TIME_SWAPS: Optional[float] = None | ||
|
||
|
||
class TestPolicyMap(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test case checking, whether swapping (a = A.get_state()
-> B.set_state(a)
-> B becomes A) vs no-swapping (re-creating of policies) is much faster.
from ray.rllib.utils.tf_utils import get_tf_eager_cls_if_necessary | ||
|
||
|
||
class TestPolicyStateSwapping(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Test proving that state swaps work as expected, even with GPU policies.
@@ -247,6 +250,8 @@ class for. | |||
raise ImportError("Could not import tensorflow!") | |||
|
|||
if framework == "tf2": | |||
if not tf1.executing_eagerly(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this here in case we are using this utility function outside of "regular" RLlib classes (such as RolloutWorker, which handles this itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this likely is not gonna work in a util function.
enable_eager_execution()
needs to be called before any TF operations, and throws exceptions otherwise.
this usually needs to be enabled in main()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but then it would fail on the next line anyways, which asserts that eager is enabled :)
I added this, such that this util function can be more safely used in test cases, where we switch between eager and traced (and graph) execution to prove that something works for all frameworks.
@@ -34,29 +24,25 @@ class PolicyMap(dict): | |||
|
|||
def __init__( | |||
self, | |||
worker_index: int, | |||
num_workers: int, | |||
*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified PolicyMap a lot, making it a very thin wrapper around a dict and removing the "heavy" add_policy/create_policy APIs. Users should create policies outside the PolicyMap and just use
policy_obj = ctor(a=.., b=.., c=..)
policy_map[some_id] = policy_obj
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Signed-off-by: sven1977 <[email protected]>
…cy_map_lru_cache_enhancements
Signed-off-by: sven1977 <[email protected]>
@@ -660,6 +660,7 @@ def _init_optimizers(self): | |||
def maybe_initialize_optimizer_and_loss(self): | |||
# We don't need to initialize loss calculation for MultiGPUTowerStack. | |||
if self._is_tower: | |||
self.get_session().run(tf1.global_variables_initializer()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to call Session.run() during policy init? / Why did we not need to do this before?
# to succeed (to speed things up a little; some trainable policies may receive | ||
# more or less data and may thus learn more or less quickly). | ||
f"policy_reward_mean/pol{i}": 50.0 for i in range(num_trainable) | ||
}, **{"timesteps_total": 400000}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the actual execution of these configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like these configs are only set up, but nothing is tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great question. Meet our new and improved yaml-replacement config files, purely written in python and consumable by the RLlib CLI and our CI learning script (tests/run_regression_test.py)
:)
You only need to define a config
var and - optionally - a stop
var in those scripts, then you can pass them on the command line to the respective script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's awesome! I lived under a stone for a couple of days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only "real" thing I want to add to Kourosh's review is that multi-agent-cartpole-w-1000-policies-appo.py seems to be missing some things! The rest is not blocking 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, meant to comment instead of approve!
Thanks a ton for your reviews @kouroshHakha and @ArturNiederfahrenhorst ! |
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…cy_map_lru_cache_enhancements
Signed-off-by: sven1977 <[email protected]>
…cy_map_lru_cache_enhancements
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best PR of the week :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! :)
…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)
…instead of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513)" This reverts commit ed3f3c0.
…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513) Signed-off-by: Weichen Xu <[email protected]>
…of GC'ing and recreating) + use Ray object store (instead of file system). (ray-project#29513) Signed-off-by: tmynn <[email protected]>
PolicyMap LRU cache enhancements:
AlgorithmConfig.multi_agent(policies_swappable=True)
setting allows "state-swapping out" policies (instead of least used one being GC'd and recreated when used again). Swapping works very simple as:s = A.get_state(); B.set_state(s)
, whereA
andB
are policies in the map.This should allow for a much faster (~15-20x) policy caching.
3 new test cases have been added:
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.