-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). #29513
Merged
sven1977
merged 59 commits into
ray-project:master
from
sven1977:policy_map_lru_cache_enhancements
Nov 30, 2022
Merged
Changes from all commits
Commits
Show all changes
59 commits
Select commit
Hold shift + click to select a range
b59d74e
wip
sven1977 76b3c1c
Merge branch 'master' into policy_map_lru_cache_enhancements
sven1977 d3c2214
wip
sven1977 e9f2901
wip
sven1977 3f256d4
wip
sven1977 ea12a2e
wip
sven1977 a9c441a
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 9c1be5b
wip
sven1977 e5def7d
LINT
sven1977 be5c5bb
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 3c77434
wip
sven1977 763b150
wip
sven1977 ec6dbed
wip
sven1977 a42b7df
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 492ca66
wip
sven1977 b9bb2a4
wip
sven1977 0d6100a
wip
sven1977 4e10e5b
wip
sven1977 f10fa48
Merge branch 'master' of https://github.com/ray-project/ray into only…
sven1977 fec7127
wip
sven1977 6abd07a
LINT
sven1977 b0995e2
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 1e70273
wip
sven1977 c605dcb
wip
sven1977 7918fd3
wip
sven1977 d98fd28
wip
sven1977 a8cc1ea
wip
sven1977 a0575a8
LINT.
sven1977 717d1c4
wip
sven1977 963d8a5
Merge branch 'run_regression_test_should_handle_py_files' into policy…
sven1977 0863f0c
Merge branch 'only_sync_updated_policy_weights' into policy_map_lru_c…
sven1977 988c943
wip
sven1977 b29bb06
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 035bd59
wip
sven1977 85fe5b5
wip
sven1977 7f872b3
wip
sven1977 b411c7e
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 98bbfac
LINT
sven1977 28bb8e3
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 bb8acaa
wip
sven1977 1a2c28f
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 7f657ec
fixes
sven1977 3165b3b
fixes
sven1977 ac0e706
fixes
sven1977 439ec75
wip
sven1977 7c8838a
wip
sven1977 9ce7997
fix
sven1977 352da04
wip
sven1977 3a37b97
wip
sven1977 0979b53
wip
sven1977 8dbc53e
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 d16d855
wip
sven1977 7125de0
Merge branch 'master' of https://github.com/ray-project/ray into poli…
sven1977 9cfd910
wip
sven1977 7eed7c0
wip
sven1977 3efb710
wip
sven1977 e52d67b
wip
sven1977 e39237d
wip
sven1977 e5d1362
wip
sven1977 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,17 @@ | |
from gym.spaces import Space | ||
import logging | ||
import math | ||
from typing import TYPE_CHECKING, Any, Callable, Dict, Optional, Tuple, Type, Union | ||
from typing import ( | ||
Any, | ||
Callable, | ||
Container, | ||
Dict, | ||
Optional, | ||
Tuple, | ||
Type, | ||
TYPE_CHECKING, | ||
Union, | ||
) | ||
|
||
import ray | ||
from ray.rllib.evaluation.rollout_worker import RolloutWorker | ||
|
@@ -12,6 +22,7 @@ | |
from ray.rllib.env.multi_agent_env import MultiAgentEnv | ||
from ray.rllib.evaluation.collectors.sample_collector import SampleCollector | ||
from ray.rllib.evaluation.collectors.simple_list_collector import SimpleListCollector | ||
from ray.rllib.evaluation.episode import Episode | ||
from ray.rllib.models import MODEL_DEFAULTS | ||
from ray.rllib.policy.policy import Policy, PolicySpec | ||
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID | ||
|
@@ -28,6 +39,7 @@ | |
from ray.rllib.utils.from_config import from_config | ||
from ray.rllib.utils.policy import validate_policy_id | ||
from ray.rllib.utils.typing import ( | ||
AgentID, | ||
AlgorithmConfigDict, | ||
EnvConfigDict, | ||
EnvType, | ||
|
@@ -267,11 +279,11 @@ def __init__(self, algo_class=None): | |
# `self.multi_agent()` | ||
self.policies = {DEFAULT_POLICY_ID: PolicySpec()} | ||
self.policy_map_capacity = 100 | ||
self.policy_map_cache = None | ||
self.policy_mapping_fn = ( | ||
lambda aid, episode, worker, **kwargs: DEFAULT_POLICY_ID | ||
) | ||
self.policies_to_train = None | ||
self.policy_states_are_swappable = False | ||
self.observation_fn = None | ||
self.count_steps_by = "env_steps" | ||
|
||
|
@@ -344,6 +356,12 @@ def __init__(self, algo_class=None): | |
self.timesteps_per_iteration = DEPRECATED_VALUE | ||
self.min_iter_time_s = DEPRECATED_VALUE | ||
self.collect_metrics_timeout = DEPRECATED_VALUE | ||
self.min_time_s_per_reporting = DEPRECATED_VALUE | ||
self.min_train_timesteps_per_reporting = DEPRECATED_VALUE | ||
self.min_sample_timesteps_per_reporting = DEPRECATED_VALUE | ||
self.input_evaluation = DEPRECATED_VALUE | ||
self.policy_map_cache = DEPRECATED_VALUE | ||
|
||
# The following values have moved because of the new ReplayBuffer API | ||
self.buffer_size = DEPRECATED_VALUE | ||
self.prioritized_replay = DEPRECATED_VALUE | ||
|
@@ -358,7 +376,6 @@ def __init__(self, algo_class=None): | |
self.min_time_s_per_reporting = DEPRECATED_VALUE | ||
self.min_train_timesteps_per_reporting = DEPRECATED_VALUE | ||
self.min_sample_timesteps_per_reporting = DEPRECATED_VALUE | ||
self.input_evaluation = DEPRECATED_VALUE | ||
self.horizon = DEPRECATED_VALUE | ||
self.soft_horizon = DEPRECATED_VALUE | ||
|
||
|
@@ -458,9 +475,9 @@ def update_from_dict( | |
for k in [ | ||
"policies", | ||
"policy_map_capacity", | ||
"policy_map_cache", | ||
"policy_mapping_fn", | ||
"policies_to_train", | ||
"policy_states_are_swappable", | ||
"observation_fn", | ||
"count_steps_by", | ||
] | ||
|
@@ -1601,13 +1618,21 @@ def multi_agent( | |
self, | ||
*, | ||
policies=NotProvided, | ||
policy_map_capacity=NotProvided, | ||
policy_map_cache=NotProvided, | ||
policy_mapping_fn=NotProvided, | ||
policies_to_train=NotProvided, | ||
observation_fn=NotProvided, | ||
count_steps_by=NotProvided, | ||
policy_map_capacity: Optional[int] = NotProvided, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed type annotations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice. |
||
policy_mapping_fn: Optional[ | ||
Callable[[AgentID, "Episode"], PolicyID] | ||
] = NotProvided, | ||
policies_to_train: Optional[ | ||
Union[Container[PolicyID], Callable[[PolicyID, SampleBatchType], bool]] | ||
] = NotProvided, | ||
policy_states_are_swappable: Optional[bool] = NotProvided, | ||
observation_fn: Optional[Callable] = NotProvided, | ||
count_steps_by: Optional[str] = NotProvided, | ||
# Deprecated args: | ||
replay_mode=DEPRECATED_VALUE, | ||
# Now done via Ray object store, which has its own cloud-supported | ||
# spillover mechanism. | ||
policy_map_cache=DEPRECATED_VALUE, | ||
) -> "AlgorithmConfig": | ||
"""Sets the config's multi-agent settings. | ||
|
||
|
@@ -1622,9 +1647,6 @@ def multi_agent( | |
observation- and action spaces of the policies, and any extra config. | ||
policy_map_capacity: Keep this many policies in the "policy_map" (before | ||
writing least-recently used ones to disk/S3). | ||
policy_map_cache: Where to store overflowing (least-recently used) policies? | ||
Could be a directory (str) or an S3 location. None for using the | ||
default output dir. | ||
policy_mapping_fn: Function mapping agent ids to policy ids. The signature | ||
is: `(agent_id, episode, worker, **kwargs) -> PolicyID`. | ||
policies_to_train: Determines those policies that should be updated. | ||
|
@@ -1636,6 +1658,19 @@ def multi_agent( | |
or not, given the particular batch). This allows you to have a policy | ||
trained only on certain data (e.g. when playing against a certain | ||
opponent). | ||
policy_states_are_swappable: Whether all Policy objects in this map can be | ||
"swapped out" via a simple `state = A.get_state(); B.set_state(state)`, | ||
where `A` and `B` are policy instances in this map. You should set | ||
this to True for significantly speeding up the PolicyMap's cache lookup | ||
times, iff your policies all share the same neural network | ||
architecture and optimizer types. If True, the PolicyMap will not | ||
have to garbage collect old, least recently used policies, but instead | ||
keep them in memory and simply override their state with the state of | ||
the most recently accessed one. | ||
For example, in a league-based training setup, you might have 100s of | ||
the same policies in your map (playing against each other in various | ||
combinations), but all of them share the same state structure | ||
(are "swappable"). | ||
observation_fn: Optional function that can be used to enhance the local | ||
agent observations to include more state. See | ||
rllib/evaluation/observation_function.py for more info. | ||
|
@@ -1681,9 +1716,6 @@ def multi_agent( | |
if policy_map_capacity is not NotProvided: | ||
self.policy_map_capacity = policy_map_capacity | ||
|
||
if policy_map_cache is not NotProvided: | ||
self.policy_map_cache = policy_map_cache | ||
|
||
if policy_mapping_fn is not NotProvided: | ||
# Attempt to create a `policy_mapping_fn` from config dict. Helpful | ||
# is users would like to specify custom callable classes in yaml files. | ||
|
@@ -1694,6 +1726,12 @@ def multi_agent( | |
if observation_fn is not NotProvided: | ||
self.observation_fn = observation_fn | ||
|
||
if policy_map_cache != DEPRECATED_VALUE: | ||
deprecation_warning( | ||
old="AlgorithmConfig.multi_agent(policy_map_cache=..)", | ||
error=True, | ||
) | ||
|
||
if replay_mode != DEPRECATED_VALUE: | ||
deprecation_warning( | ||
old="AlgorithmConfig.multi_agent(replay_mode=..)", | ||
|
@@ -1730,6 +1768,9 @@ def multi_agent( | |
) | ||
self.policies_to_train = policies_to_train | ||
|
||
if policy_states_are_swappable is not None: | ||
self.policy_states_are_swappable = policy_states_are_swappable | ||
|
||
return self | ||
|
||
def is_multi_agent(self) -> bool: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned this up for a new test that swaps out policy weights on GPU.