-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Make policies_to_train
more flexible via callable option.
#20735
[RLlib] Make policies_to_train
more flexible via callable option.
#20735
Conversation
…cies_to_train_by_callable # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/rollout_worker.py # rllib/evaluation/worker_set.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few questions. thanks.
doc/source/rllib-env.rst
Outdated
A multi-agent environment is one which has multiple acting entities per step, e.g., in a traffic simulation, there may be multiple "car"- and "traffic light" agents in the environment. The model for multi-agent in RLlib is as follows: (1) as a user, you define the number of policies available up front, and (2) a function that maps agent ids to policy ids. This is summarized by the below figure: | ||
In a multi-agent environment, there are more than one "agent" acting simultaneously, in a turn-based fashion, or in a combination of these two. | ||
|
||
For example, in a traffic simulation, there may be multiple "car"- and "traffic light" agents in the environment, acting simultaneously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "car"-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib-env.rst
Outdated
config={"gamma": 0.85}, # use main config plus <- this override here | ||
), # alternatively, simply do: `PolicySpec(config={"gamma": 0.85})` | ||
|
||
# Deprecated way: Specify class, obs-/action-spaces, config-overrides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, is "class is None" a deprecated way of specifying things or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should no longer be used. Better to use PolicySpec
namedtuple. Changed comment.
rllib/agents/trainer.py
Outdated
# Specifies those policies that should be updated. | ||
# Options are: | ||
# - None for all policies. | ||
# - An iterable of PolicyIDs to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit nit nit
maybe just "An iterable of PolicyIDs"
I am not sure if "to be updated" applies to the list of ids, or the policies specified via these ids. when reading this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and clarified.
rllib/evaluation/rollout_worker.py
Outdated
""" | ||
if policies_to_train is not None: | ||
self.policies_to_train = policies_to_train | ||
if is_policy_to_train is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should None reset self.is_policy_to_train to empty list?
I also feel like we shouldn't allow is_policy_to_train to have a default value, which does nothing. e.g., why would someone do
self.set_is_policy_to_train()
by itself ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually, you are right. Not sure why this should be supported. I'll fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this non-Optional.
# Set of IDs of those policies, which should be trained. This property | ||
# is optional and mainly used for backward compatibility. | ||
self.policies_to_train: Optional[Set[PolicyID]] = None | ||
self.is_policy_to_train: Callable[[PolicyID, SampleBatchType], bool] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like self._should_train_policy may be a better name than is_policy_to_train.
also, if these variables are private-ish, we should name them _policies_to_train and _should_train_policy
last thing is I feel like we may not want to keep a copy of policies_to_train on self. that just confuses things, since self.is_policy_to_train is the real source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is just for backward compatibility. Some users may still use this property somewhere and those users are unlikely to ever change the policies_to_train
list. So for them, it wouldn't matter, whether self.is_policy_to_train
is the actual source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But aren't functions that return bools always named: "is_...()"? Or "isPolicyToTrain()", etc..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. ok, will live with this then.
depending on the style guide :) I found our naming a bit less consistent sometimes, but that's a minor thing.
rllib/evaluation/rollout_worker.py
Outdated
@@ -843,7 +851,7 @@ def learn_on_batch(self, samples: SampleBatchType) -> Dict: | |||
builders = {} | |||
to_fetch = {} | |||
for pid, batch in samples.policy_batches.items(): | |||
if pid not in self.policies_to_train: | |||
if not self.policies_to_train(pid, samples): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should call self.is_policies_to_train() or self._should_train_policy() right?
there are quite a few places below that also use self.policies_to_train(), which is a list now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, thanks!
rllib/evaluation/rollout_worker.py
Outdated
# By default (None), use the set of all policies found in the | ||
# policy_dict. | ||
if policies_to_train is None: | ||
policies_to_train = set(self.policy_dict.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be self.policies_to_train = set(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm
rllib/evaluation/rollout_worker.py
Outdated
@@ -1132,7 +1146,7 @@ def add_policy( | |||
self.observation_filter, new_policy.observation_space.shape) | |||
|
|||
self.set_policy_mapping_fn(policy_mapping_fn) | |||
self.set_policies_to_train(policies_to_train) | |||
self.set_is_policy_to_train(policies_to_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
self.local_worker = self.policy_ids = None | ||
if local_worker: | ||
self.local_worker = local_worker | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why else here?
why only set self.policy_ids if local_worker is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backward compatibility again :)
- New: If the local_worker is given -> Use it's
is_policy_to_train()
method. - Old: Use given PolicyID list to figure out whether a policy is trainable or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, it will take me forever to adjust ... 😭
…cies_to_train_by_callable # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/worker_set.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. have some random comments left, but this looks good. feel free to merge.
# Set of IDs of those policies, which should be trained. This property | ||
# is optional and mainly used for backward compatibility. | ||
self.policies_to_train: Optional[Set[PolicyID]] = None | ||
self.is_policy_to_train: Callable[[PolicyID, SampleBatchType], bool] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. ok, will live with this then.
depending on the style guide :) I found our naming a bit less consistent sometimes, but that's a minor thing.
rllib/evaluation/rollout_worker.py
Outdated
# By default (None), use the set of all policies found in the | ||
# policy_dict. | ||
if policies_to_train is None: | ||
policies_to_train = set(self.policy_dict.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm
@@ -1150,7 +1165,8 @@ def remove_policy( | |||
*, | |||
policy_id: PolicyID = DEFAULT_POLICY_ID, | |||
policy_mapping_fn: Optional[Callable[[AgentID], PolicyID]] = None, | |||
policies_to_train: Optional[List[PolicyID]] = None, | |||
policies_to_train: Optional[Union[Container[PolicyID], Callable[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's worth it to define some names for the 2 func types here, kinda long, and used multiple times.
totally up to you.
self.local_worker = self.policy_ids = None | ||
if local_worker: | ||
self.local_worker = local_worker | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, it will take me forever to adjust ... 😭
…cies_to_train_by_callable
This PR introduces the option to set
config.multiagent.policies_to_train
to a callable (alternatively to providing a list/set of PolicyIDs). This callable takes the policyID and an optionalSampleBatch|MultiAgentBatch
as args and returns a bool (trainable or not?). This will allow for a more fine-grained control over which policies need to be updated, for example in multi-agent scenarios where policy A should be trained if playing against policy B, not not if playing against policy C.Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.