[RLlib] `Algorithm.add_policy()` should alternatively accept an already instantiated policy object. #28637

sven1977 · 2022-09-20T13:01:00Z

Signed-off-by: sven1977 [email protected]

Algorithm.add_policy() should alternatively accept an already instantiated policy object.

Same for RolloutWorker.add_policy().
Enhanced existing test case to cover this behavior.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

gjoliver

thanks for the really nice UX change.

gjoliver · 2022-09-21T07:01:38Z

rllib/algorithms/algorithm.py

+                    policy_mapping_fn=policy_mapping_fn,
+                )
+                # Then add a new instance to each remote worker.
+                ray.get([w.apply.remote(fn) for w in self.workers.remote_workers()])


should we simply do

ray.get([w.add_policy.remote(**kwargs) for w in self.workers.remote_workers()])

?

gjoliver · 2022-09-21T07:03:53Z

rllib/algorithms/algorithm.py

+                            policy_mapping_fn=policy_mapping_fn,
+                        )
+                    else:
+                        fn(worker)


why not simply:

worker.add_policy(**kwargs)

?

cleaned this up a little.

gjoliver · 2022-09-21T07:05:33Z

rllib/algorithms/algorithm.py

+                f"{list(local_worker.policy_map.keys())}"
+            )
+
+        if policy_cls is not None and policy is not None:


probably need to check if both are None?

great catch! :)

gjoliver · 2022-09-21T07:10:54Z

rllib/algorithms/algorithm.py

            # Run foreach_worker fn on all workers.
-            self.workers.foreach_worker(fn)
+            else:
+                self.workers.foreach_worker(fn)

        # Update evaluation workers, if necessary.
        if evaluation_workers and self.evaluation_workers is not None:


one thing that feels a bit weird is how come we can handle this for eval_workers with 1 line of code, but we need this many if_elses for rollout workers.
I wonder if we can do something similar for rollout workers, basically self.workers.foreach_worker(fn),
and WorkerSet.foreach_worker() will apply the fn locally on the local worker, or apply it remotely on all the remote workers.
that way, the logics about individual workers are capsulated behind the WorkerSet abstraction.
does that work??

Eval workers don't have a local worker :)

For the (only one!) local worker, we should insert the policy directly into its policy_map, no re-creation of a new instance is required. That's the whole point of this PR, I guess.

The foreach_worker utility is actually fine (handles local worker properly) and has nothing to do with this.

Let me try to simplify the rest ...

Cleaned up a little (removed the helper function entirely, not needed).

wait, eval workers can have local worker too??
I often set evaluation_num_workers=0 for OPE, since we do OPE on trainer node anyways.
that will cause the evaluation_workers to use only a local worker?

also, one thing I am always a bit confused, if we already have a WorkerSet abstraction, why should Algorithm still manipulate individual local and remote workers itself. feel like it's better to have WorkerSet handle the underlying details?

for the specific case, I wonder if we should simply do something like:

rollout_workers.local_worker().add_policy(policy=policy) rollout_workers.remote_worker().add_policy(policy_cls=type(policy), policy_state=...) evaluation_workers.local_worker().add_policy(policy_cls=type(policy), policy_state=...) evaluation_workers.remote_worker().add_policy(policy_cls=type(policy), policy_state=...)

then, the policy is only claimed by the local rollout worker.

The evaluation worker set only has a local worker if evaluation_num_workers=0. Otherwise, we'll skip generating it.

If you do: Algorithm.evaluate(), it will:

first try to use the evaluation worker set (be it with local worker (evaluation_num_workers=0) or without local worker).

then, if there is NO evaluation worker set at all, use the regular local worker. Note that this only works, if that local worker has an env (by default, we don't create one on the regular local worker)

I think you are right and we should create a WorkerSet.add/remove_policy API. Then we can move all the code that's currently in Algorithm.add_policy into the WorkerSet and in the algo, simply do:

def add_policy(...): self.workers.add_policy() self.evaluation_workers.add_policy() return

gjoliver · 2022-09-21T07:12:13Z

rllib/connectors/util.py

@@ -78,8 +78,10 @@ def create_connectors_for_policy(policy: "Policy", config: TrainerConfigDict):
    """
    ctx: ConnectorContext = ConnectorContext.from_policy(policy)

-    policy.agent_connectors = get_agent_connectors_from_config(ctx, config)
-    policy.action_connectors = get_action_connectors_from_config(ctx, config)
+    if policy.agent_connectors is None:


hmm, I wonder why we need to check this.
are we calling this function for an existing policy that already has connectors restored??

What if you have recovered this policy in here from a checkpoint? Then you would also already have the connectors inside this policy, correct?
In this case, you wouldn't want to re-create the connectors. Let me know if this chain of thought is wrong.

we shouldn't be calling this util if we are recovering a policy. this is only used when a policy is constructed from scratch.
do you mind removing these checks? if things fail somehow, I'd rather get an explicit signal than having actual problem concealed by these.

Got it, so you are saying we can assume 100% that every time we are recovering a policy say from a checkpoint, the connectors should already be in there? In this case, I added an assert to the utility and fixed the add_policy method to NOT call this utility iff policy was provided as an already instantiated one.

gjoliver · 2022-09-21T07:12:48Z

rllib/evaluation/rollout_worker.py

+                " Policy IDs that are already in your policy map: "
+                f"{list(self.workers.local_worker().policy_map.keys())}"
+            )
+        if policy_cls is not None and policy is not None:


same, one of these need to be not None?

gjoliver · 2022-09-21T07:32:46Z

rllib/evaluation/rollout_worker.py

        policy_id: PolicyID,
-        policy_cls: Type[Policy],
+        policy_cls: Optional[Type[Policy]] = None,
+        policy: Optional[Policy] = None,


just want to mention a debate I had with myself while looking at this.
if we are willing to sacrifice a little bit of efficiency for the local worker, we can actually make PolicySpec the narrow waist of all this.
then we won't need to change rollout_worker or policy_map.
and in Algorithm.add_policy(), we would simple get the policy_spec and policy_state if we get passed a policy, instead of policy_cls.
I understand that will cause the policy to be created again for the local worker, wasting compute and mem. but it seems like we can greatly simplify all these logics if we can have a narrow waist for add_policy().
just some thoughts that I figure I'd mention, since I can't make up my mind either.
let me know what you think.

Hmm, I'm not sure about this. Was thinking about this, too :)
But the problem is that the expectation (mental model of user) of doing my_algo.add_policy(my_policy_instance) is that my_policy_instance is actually as-is being incorporated somehow to the algorithm.

yeah maybe.
if we were writing C++, we can make this super clear by declare the argument const Policy& policy if the policy is meant to be duplicated, or "Policy* policy" if the ownership of the policy is supposed to be transferred.
python is 🤷‍♂️

🤷‍♂️

Signed-off-by: sven1977 <[email protected]>

…policy_takes_policy_instance

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2022-09-23T11:37:18Z

rllib/evaluation/rollout_worker.py

        policy_id: PolicyID,
-        policy_cls: Type[Policy],
+        policy_cls: Optional[Type[Policy]] = None,
+        policy: Optional[Policy] = None,


sven1977 · 2022-09-23T11:39:39Z

rllib/evaluation/worker_set.py

@@ -231,6 +244,204 @@ def sync_weights(
        elif self.local_worker() is not None and global_vars is not None:
            self.local_worker().set_global_vars(global_vars)

+    def add_policy(


New API for WorkerSet:

add_policy(): Similar to Algorithm.add_policy().

WorkerSet.add_policy_to_workers(): New static helper utility for adding a new policy (by instance or options) to a list of (local and/or remote) workers.

sven1977 · 2022-09-23T11:41:33Z

rllib/evaluation/rollout_worker.py

-                policy_spec.config,  # overrides.
-                merged_conf,
-            )
+            if policy is not None:


Depending on whether the policy is given as an already instantiated object or not, use either create_policy() or insert_policy(). Note that create_policy also uses insert_policy internally now.

sven1977 · 2022-09-23T11:41:59Z

rllib/evaluation/rollout_worker.py

-            )
-        ):
-            create_connectors_for_policy(self.policy_map[policy_id], self.policy_config)
+        # Create connectors for the new policy, if necessary.


Simplified this if-block here. Some checks were superfluous.

sven1977 · 2022-09-23T11:42:18Z

rllib/algorithms/tests/test_algorithm.py

-                    # Change the list of policies to train.
-                    policies_to_train=[f"p{i}", f"p{i-1}"],
-                )
+                print(f"Adding policy {pid} ...")


Also test adding a new policy by instance now.

sven1977 · 2022-09-23T11:43:29Z

rllib/algorithms/algorithm.py

@@ -1547,7 +1547,8 @@ def set_weights(self, weights: Dict[PolicyID, dict]):
    def add_policy(
        self,
        policy_id: PolicyID,
-        policy_cls: Type[Policy],
+        policy_cls: Optional[Type[Policy]] = None,


Simplified the Algorithm.add_policy() method by only using the new WorkerSet APIs. No more micro-handling individual workers policy_maps here.

sven1977 · 2022-09-23T11:43:45Z

rllib/algorithms/algorithm.py

@@ -231,7 +231,7 @@ class directly. Note that this arg can also be specified via
        """

        # User provided (partial) config (this may be w/o the default
-        # Trainer's Config object). Will get merged with AlgorithmConfig()
+        # Algorithm's Config object). Will get merged with AlgorithmConfig()


Found a few of these old Trainer in the comments.

gjoliver · 2022-09-26T17:00:21Z

rllib/algorithms/algorithm.py

@@ -1588,46 +1596,69 @@ def add_policy(
                returns False) will not be updated.
            evaluation_workers: Whether to add the new policy also
                to the evaluation WorkerSet.
-            workers: A list of RolloutWorker/ActorHandles (remote
+            worker_list: A list of RolloutWorker/ActorHandles (remote


I have a random request, do you mind staying with workers for now? since this is a simple name change.
as part of the elastic training PR, I am getting rid of all these places where we are accessing underlying RolloutWorkers outside of WorkerSet.
so if we deprecate this today, in a few days, I am gonna have to deprecate worker_list too, and we will have 2 deprecated fields here.

gjoliver · 2022-09-26T17:04:17Z

rllib/algorithms/algorithm.py

+        # Worker list is explicitly provided -> Use only those workers (local or remote)
+        # specified.
+        if worker_list is not None:
+            RolloutWorker.add_policy_to_workers(


I like this change, this actually works well with my fault tolerance PR. I will make WorkerSet.add_policy_to_workers() the only way to go about this in the future actually.

one problem though, this should be WorkerSet. not RolloutWorker.?

Oh, yeah, great catch!

Signed-off-by: sven1977 <[email protected]>

gjoliver

Please make sure all the tests pass! Thanks.

…dy instantiated policy object. (ray-project#28637) Signed-off-by: Weichen Xu <[email protected]>

wip

f0dfda9

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners September 20, 2022 13:01

wip

b31114c

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from rkooo567 and simon-mo as code owners September 20, 2022 17:11

merge

8fde067

Signed-off-by: sven1977 <[email protected]>

sven1977 assigned gjoliver Sep 20, 2022

wip

c8afcf3

Signed-off-by: sven1977 <[email protected]>

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Sep 21, 2022

gjoliver reviewed Sep 21, 2022

View reviewed changes

wip

f230ffc

Signed-off-by: sven1977 <[email protected]>

simon-mo removed their request for review September 21, 2022 21:27

sven1977 added 3 commits September 23, 2022 08:45

Merge branch 'master' of https://github.com/ray-project/ray into add_…

80b1829

…policy_takes_policy_instance

wip

a01ccb2

Signed-off-by: sven1977 <[email protected]>

wip

447ba02

Signed-off-by: sven1977 <[email protected]>

sven1977 commented Sep 23, 2022

View reviewed changes

sven1977 mentioned this pull request Sep 26, 2022

[RLlib] Algorithm/Policy checkpoint overhaul and Policy Model export (in native formats). #28166

Merged

7 tasks

gjoliver requested changes Sep 26, 2022

View reviewed changes

wip

93c51a1

Signed-off-by: sven1977 <[email protected]>

wip

4d42ac2

Signed-off-by: sven1977 <[email protected]>

gjoliver approved these changes Sep 26, 2022

View reviewed changes

sven1977 merged commit 03c7bca into ray-project:master Sep 26, 2022

rickyyx mentioned this pull request Oct 26, 2022

Release Performance Regression 2.0/2.1 #29615

Closed

7 tasks

WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022

[RLlib] Algorithm.add_policy() should alternatively accept an alrea…

098e86d

…dy instantiated policy object. (ray-project#28637) Signed-off-by: Weichen Xu <[email protected]>

[RLlib] Algorithm.add_policy() should alternatively accept an already instantiated policy object. #28637

[RLlib] Algorithm.add_policy() should alternatively accept an already instantiated policy object. #28637

Conversation

sven1977 commented Sep 20, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

[RLlib] `Algorithm.add_policy()` should alternatively accept an already instantiated policy object. #28637

[RLlib] `Algorithm.add_policy()` should alternatively accept an already instantiated policy object. #28637

sven1977 commented Sep 20, 2022 •

edited

Loading