[RLlib] Make `policies_to_train` more flexible via callable option. #20735

sven1977 · 2021-11-26T10:16:44Z

This PR introduces the option to set config.multiagent.policies_to_train to a callable (alternatively to providing a list/set of PolicyIDs). This callable takes the policyID and an optional SampleBatch|MultiAgentBatch as args and returns a bool (trainable or not?). This will allow for a more fine-grained control over which policies need to be updated, for example in multi-agent scenarios where policy A should be trained if playing against policy B, not not if playing against policy C.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…cies_to_train_by_callable # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/rollout_worker.py # rllib/evaluation/worker_set.py

gjoliver

a few questions. thanks.

gjoliver · 2022-01-19T19:51:53Z

doc/source/rllib-env.rst

-A multi-agent environment is one which has multiple acting entities per step, e.g., in a traffic simulation, there may be multiple "car"- and "traffic light" agents in the environment. The model for multi-agent in RLlib is as follows: (1) as a user, you define the number of policies available up front, and (2) a function that maps agent ids to policy ids. This is summarized by the below figure:
+In a multi-agent environment, there are more than one "agent" acting simultaneously, in a turn-based fashion, or in a combination of these two.
+
+For example, in a traffic simulation, there may be multiple "car"- and "traffic light" agents in the environment, acting simultaneously.


typo "car"-

gjoliver · 2022-01-19T20:14:28Z

doc/source/rllib-env.rst

+                    config={"gamma": 0.85},  # use main config plus <- this override here
+                    ),  # alternatively, simply do: `PolicySpec(config={"gamma": 0.85})`
+
+                # Deprecated way: Specify class, obs-/action-spaces, config-overrides


sorry, is "class is None" a deprecated way of specifying things or not?

It should no longer be used. Better to use PolicySpec namedtuple. Changed comment.

gjoliver · 2022-01-19T21:32:39Z

rllib/agents/trainer.py

+        # Specifies those policies that should be updated.
+        # Options are:
+        # - None for all policies.
+        # - An iterable of PolicyIDs to be updated.


nit nit nit
maybe just "An iterable of PolicyIDs"
I am not sure if "to be updated" applies to the list of ids, or the policies specified via these ids. when reading this.

Fixed and clarified.

gjoliver · 2022-01-19T21:42:51Z

rllib/evaluation/rollout_worker.py

        """
-        if policies_to_train is not None:
-            self.policies_to_train = policies_to_train
+        if is_policy_to_train is not None:


should None reset self.is_policy_to_train to empty list?
I also feel like we shouldn't allow is_policy_to_train to have a default value, which does nothing. e.g., why would someone do

self.set_is_policy_to_train()

by itself ... ?

Hmm, actually, you are right. Not sure why this should be supported. I'll fix.

Made this non-Optional.

gjoliver · 2022-01-19T21:47:23Z

rllib/evaluation/rollout_worker.py

+        # Set of IDs of those policies, which should be trained. This property
+        # is optional and mainly used for backward compatibility.
+        self.policies_to_train: Optional[Set[PolicyID]] = None
+        self.is_policy_to_train: Callable[[PolicyID, SampleBatchType], bool]


I feel like self._should_train_policy may be a better name than is_policy_to_train.
also, if these variables are private-ish, we should name them _policies_to_train and _should_train_policy
last thing is I feel like we may not want to keep a copy of policies_to_train on self. that just confuses things, since self.is_policy_to_train is the real source of truth.

Yeah, this is just for backward compatibility. Some users may still use this property somewhere and those users are unlikely to ever change the policies_to_train list. So for them, it wouldn't matter, whether self.is_policy_to_train is the actual source of truth.

But aren't functions that return bools always named: "is_...()"? Or "isPolicyToTrain()", etc..?

I see. ok, will live with this then.
depending on the style guide :) I found our naming a bit less consistent sometimes, but that's a minor thing.

gjoliver · 2022-01-19T21:48:41Z

rllib/evaluation/rollout_worker.py

@@ -843,7 +851,7 @@ def learn_on_batch(self, samples: SampleBatchType) -> Dict:
            builders = {}
            to_fetch = {}
            for pid, batch in samples.policy_batches.items():
-                if pid not in self.policies_to_train:
+                if not self.policies_to_train(pid, samples):


we should call self.is_policies_to_train() or self._should_train_policy() right?
there are quite a few places below that also use self.policies_to_train(), which is a list now.

Great catch, thanks!

gjoliver · 2022-01-19T21:48:59Z

rllib/evaluation/rollout_worker.py

+        # By default (None), use the set of all policies found in the
+        # policy_dict.
+        if policies_to_train is None:
+            policies_to_train = set(self.policy_dict.keys())


should be self.policies_to_train = set(...)?

gjoliver · 2022-01-19T21:50:26Z

rllib/evaluation/rollout_worker.py

@@ -1132,7 +1146,7 @@ def add_policy(
            self.observation_filter, new_policy.observation_space.shape)

        self.set_policy_mapping_fn(policy_mapping_fn)
-        self.set_policies_to_train(policies_to_train)
+        self.set_is_policy_to_train(policies_to_train)


same comments as above.

gjoliver · 2022-01-19T21:52:06Z

rllib/execution/rollout_ops.py

+        self.local_worker = self.policy_ids = None
+        if local_worker:
+            self.local_worker = local_worker
+        else:


why else here?
why only set self.policy_ids if local_worker is None?

Backward compatibility again :)

New: If the local_worker is given -> Use it's is_policy_to_train() method.

Old: Use given PolicyID list to figure out whether a policy is trainable or not.

haha, it will take me forever to adjust ... 😭

…cies_to_train_by_callable # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/worker_set.py

gjoliver

cool. have some random comments left, but this looks good. feel free to merge.

gjoliver · 2022-01-25T20:26:46Z

rllib/evaluation/rollout_worker.py

+        # Set of IDs of those policies, which should be trained. This property
+        # is optional and mainly used for backward compatibility.
+        self.policies_to_train: Optional[Set[PolicyID]] = None
+        self.is_policy_to_train: Callable[[PolicyID, SampleBatchType], bool]


I see. ok, will live with this then.
depending on the style guide :) I found our naming a bit less consistent sometimes, but that's a minor thing.

gjoliver · 2022-01-25T20:27:16Z

rllib/evaluation/rollout_worker.py

+        # By default (None), use the set of all policies found in the
+        # policy_dict.
+        if policies_to_train is None:
+            policies_to_train = set(self.policy_dict.keys())


gjoliver · 2022-01-25T20:28:32Z

rllib/evaluation/rollout_worker.py

@@ -1150,7 +1165,8 @@ def remove_policy(
            *,
            policy_id: PolicyID = DEFAULT_POLICY_ID,
            policy_mapping_fn: Optional[Callable[[AgentID], PolicyID]] = None,
-            policies_to_train: Optional[List[PolicyID]] = None,
+            policies_to_train: Optional[Union[Container[PolicyID], Callable[


I don't know if it's worth it to define some names for the 2 func types here, kinda long, and used multiple times.
totally up to you.

gjoliver · 2022-01-25T20:32:41Z

rllib/execution/rollout_ops.py

+        self.local_worker = self.policy_ids = None
+        if local_worker:
+            self.local_worker = local_worker
+        else:


haha, it will take me forever to adjust ... 😭

…cies_to_train_by_callable

sven1977 added 3 commits November 26, 2021 11:11

wip.

1a485e6

Merge branch 'master' of https://github.com/ray-project/ray into poli…

7ce5954

…cies_to_train_by_callable # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/rollout_worker.py # rllib/evaluation/worker_set.py

wip.

e17fee4

sven1977 requested a review from gjoliver January 19, 2022 10:07

sven1977 assigned gjoliver Jan 19, 2022

sven1977 added 3 commits January 19, 2022 11:08

LINT.

f15c0c6

wip.

868e2b9

fix

221c344

gjoliver reviewed Jan 19, 2022

View reviewed changes

sven1977 added 2 commits January 25, 2022 17:55

wip.

bf8cf88

Merge branch 'master' of https://github.com/ray-project/ray into poli…

5f7fc44

…cies_to_train_by_callable # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/worker_set.py

sven1977 requested a review from avnishn as a code owner January 25, 2022 17:00

gjoliver approved these changes Jan 25, 2022

View reviewed changes

sven1977 added 4 commits January 26, 2022 15:59

wip.

135b79f

wip.

67b76a2

Merge branch 'master' of https://github.com/ray-project/ray into poli…

3d8f84d

…cies_to_train_by_callable

wip.

8e9b90d

sven1977 merged commit 371fbb1 into ray-project:master Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Make `policies_to_train` more flexible via callable option. #20735

[RLlib] Make `policies_to_train` more flexible via callable option. #20735

sven1977 commented Nov 26, 2021 •

edited

Loading

gjoliver left a comment

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

sven1977 Jan 25, 2022

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

sven1977 Jan 25, 2022

gjoliver Jan 25, 2022

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

gjoliver Jan 19, 2022

gjoliver Jan 25, 2022

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

gjoliver Jan 19, 2022

sven1977 Jan 25, 2022

gjoliver Jan 25, 2022

gjoliver left a comment

gjoliver Jan 25, 2022

gjoliver Jan 25, 2022

gjoliver Jan 25, 2022

gjoliver Jan 25, 2022

[RLlib] Make policies_to_train more flexible via callable option. #20735

[RLlib] Make policies_to_train more flexible via callable option. #20735

Conversation

sven1977 commented Nov 26, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] Make `policies_to_train` more flexible via callable option. #20735

[RLlib] Make `policies_to_train` more flexible via callable option. #20735

sven1977 commented Nov 26, 2021 •

edited

Loading