[RLlib] Fix `policy_map` always loading all policies from disk due to (not always needed) `global_vars` update. #22010

PavelCz · 2022-01-31T21:30:19Z

Why are these changes needed?

In the case that you use more policies than the policy_map_capacity, the in-memory cache should contain the frequently used policies and the less frequently used are stashed to disk
Due to the implementation of get_weights and foreach_trainable_policy, all policies are loaded from disk during training, which also makes the in-memory cache contain arbitrary policies, as opposed to only accessing policies that are needed.
My changes should be equivalent, with the difference that only items that are needed are accessed.
I am less sure whether the change to set_global_vars is equivalent, however it is necessary
- During training timesteps is updated for all policies, causing all policies to be loaded from disk, making the cache useless. My change makes it so that only policies that are being trained need to be loaded from disk
Script to reproduce this problem can be found here: https://gist.github.com/PavelCz/f9356eb7a3b343099c3dfee0dfad0742

Related issue number

n/a

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

stale · 2022-04-21T02:45:04Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

gjoliver · 2022-04-21T07:24:43Z

sorry to miss this PR until now.
I think although some policies are not trainable, they need to be there for the purpose of training those trainable policies.
think about some multi-agent cases where they are the opponents for example.
so this change may break those user cases?

PavelCz · 2022-04-21T18:57:35Z

If those policies are necessary, they are still accessible to be loaded from disk. However, they should only be loaded when they are actually needed.
Current behavior is: All policies are always loaded from disk, whether they are used or not.
With my proposed changes: Trainable policies are always loaded from disk, if other policies are needed they will also be loaded from disk.
Essentially: Current behavior: If I want to access policy n, all policies will be loaded from disk, for no reason. New behavior, only load policy n. This does not change the behavior of RLlib loading policies that are needed.

Without this change having more policies than the policy_map_cache is unusable, as the disk will constantly be accessed.

gjoliver

ok, a couple of more questions.
thanks a lot for the contribution!

gjoliver · 2022-04-25T19:28:20Z

rllib/evaluation/rollout_worker.py

-            func(policy, pid, **kwargs)
-            for pid, policy in self.policy_map.items()
+            func(self.policy_map[pid], pid, **kwargs)
+            for pid in self.policy_map.keys()


I can't really see the difference between current and updated version here.
can you say a few words about the intention here?
same below in get_weights() function.

Yes from the view of the rollout_worker the behavior shouldn't change.
The difference is that iterating over items() will access both the key and the policy, i.e. internally PolicyMap will call __getitem__. __getitem__ will access the disk, if the selected policy is not in the cache. Every policy is accessed and then afterwards the if statement in the next line checks whether we are actually interested in this policy. With my changes we only call __getitem__ for those policies that pass through the if statement. This way we reduce the number of unnecessary access of policies and in turn reduce the number of unnecessary _read_from_disk() (Compare rllib/policy/policy_map.py).
The same goes for get_weights().

In short: Don't do policy_map[pid] if we skip this pid anyway, due to the if statement. This is important because policy_map[pid] might have disk reads.

🤯 this is some highly optimized stuff.
can you please help add a big comment above these 2 places, about what you just said, so we don't come in and accidentally "clean up" these loops some day.
appreciate the thoughtful changes.

understand! this is great.
I think global_vars are mostly for training purpose, so it should be fine to skip if a policy is not being trained.

gjoliver · 2022-04-25T19:29:58Z

rllib/evaluation/rollout_worker.py

@@ -1561,7 +1561,7 @@ def set_global_vars(self, global_vars: dict) -> None:
        Examples:
            >>> global_vars = worker.set_global_vars({"timestep": 4242})
        """
-        self.foreach_policy(lambda p, _: p.on_global_var_update(global_vars))
+        self.foreach_policy_to_train(lambda p, _: p.on_global_var_update(global_vars))


I see. so this PR doesn't really change what policies get loaded to the map, it only changes what policies get updated. is my understanding correct?
thanks.

Yes, this change only changes what policy is updated with global vars.

The problem:

In order to update a policy, if it isn't in the memory cache it will need to be loaded from disk.

Looking through the project, it looks like every time set_global_vars() is called _get_global_vars is passed as argument

I'm not sure what kind of global vars there usually are, but looking into rllib/execution/common.py it looks like _get_global_vars() basically always returns an updated timestep

Consequently, if any policy is written to disk, these are read from disk all the time to update their timesteps and we have constant disk access again, slowing everything down, kind of defeating the reason why we put them to the disk in the first place.

The solution:

What at least helps with this problem is only updating global vars for policies that are being trained.

If all policies that are being trained fit into the in-memory cache, the disk won't constantly be accessed.

If they don't fit, then I see no way to stop constant disk access anyway.

I'm not 100% sure this doesn't have other consequences, someone with better knowledge of the code base would have to judge that. However, I see no reason why policies that are not being trained would need their vars updated (are there even other vars besides timesteps).

At any rate, without this case we are again at a point where any disk that is being offloaded to disk is constantly being accessed, making it basically unusable to have more policies than the policy_map_capacity.

Makes perfect sense! Thanks so much for digging into this problem here @PavelCz and all your help with the PR @gjoliver .
This is a really valuable fix. Merging now ...

gjoliver · 2022-04-25T21:21:07Z

hey @PavelCz, one thing though, can you rebase to the latest master, so we can make sure all the tests pass?
you are on a very broken commit right now (our fault), so all the rl tests are broken.

PavelCz · 2022-04-25T21:56:31Z

I added comments to the two changes and also added a short comment to the third change. Let me know if those are ok.

… (not always needed) `global_vars` update. (ray-project#22010) (cherry picked from commit de0c6f6)

PavelCz added 2 commits January 31, 2022 13:00

Improve caching by iterating over keys instead of items for policy map

543215b

Only update global vars for policies to train

cd9e4e1

PavelCz requested review from avnishn, gjoliver and sven1977 as code owners January 31, 2022 21:30

Merge remote-tracking branch 'upstream/master' into fix-caching

15aed5f

PavelCz changed the title ~~Fix policy map always loading all policies from disk~~ [RLlib] Fix policy map always loading all policies from disk Jan 31, 2022

PavelCz added 2 commits February 5, 2022 16:32

Merge remote-tracking branch 'upstream/master' into fix-caching

417c3fe

Merge remote-tracking branch 'upstream/master' into fix-caching

9a204cd

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 21, 2022

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 21, 2022

gjoliver reviewed Apr 25, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/master' into fix-caching

564a5d1

PavelCz requested review from ArturNiederfahrenhorst and smorad as code owners April 25, 2022 21:34

Add comments to policy access fixes

7bfc86e

PavelCz added 2 commits April 26, 2022 11:17

Fix comment lines being too long

6206590

Merge remote-tracking branch 'upstream/master' into fix-caching

59919d6

gjoliver approved these changes Apr 26, 2022

View reviewed changes

sven1977 changed the title ~~[RLlib] Fix policy map always loading all policies from disk~~ [RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. Apr 29, 2022

sven1977 merged commit de0c6f6 into ray-project:master Apr 29, 2022

PavelCz added a commit to HumanCompatibleAI/ray that referenced this pull request Jun 6, 2022

[RLlib] Fix policy_map always loading all policies from disk due to…

e305fc3

… (not always needed) `global_vars` update. (ray-project#22010) (cherry picked from commit de0c6f6)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix `policy_map` always loading all policies from disk due to (not always needed) `global_vars` update. #22010

[RLlib] Fix `policy_map` always loading all policies from disk due to (not always needed) `global_vars` update. #22010

PavelCz commented Jan 31, 2022 •

edited

Loading

stale bot commented Apr 21, 2022

gjoliver commented Apr 21, 2022

PavelCz commented Apr 21, 2022

gjoliver left a comment

gjoliver Apr 25, 2022

PavelCz Apr 25, 2022 •

edited

Loading

gjoliver Apr 25, 2022

gjoliver Apr 25, 2022

gjoliver Apr 25, 2022

PavelCz Apr 25, 2022

sven1977 Apr 29, 2022

gjoliver commented Apr 25, 2022

PavelCz commented Apr 25, 2022

[RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. #22010

[RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. #22010

Conversation

PavelCz commented Jan 31, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

stale bot commented Apr 21, 2022

gjoliver commented Apr 21, 2022

PavelCz commented Apr 21, 2022

gjoliver left a comment

Choose a reason for hiding this comment

gjoliver Apr 25, 2022

Choose a reason for hiding this comment

PavelCz Apr 25, 2022 • edited Loading

Choose a reason for hiding this comment

gjoliver Apr 25, 2022

Choose a reason for hiding this comment

gjoliver Apr 25, 2022

Choose a reason for hiding this comment

gjoliver Apr 25, 2022

Choose a reason for hiding this comment

PavelCz Apr 25, 2022

Choose a reason for hiding this comment

sven1977 Apr 29, 2022

Choose a reason for hiding this comment

gjoliver commented Apr 25, 2022

PavelCz commented Apr 25, 2022

[RLlib] Fix `policy_map` always loading all policies from disk due to (not always needed) `global_vars` update. #22010

[RLlib] Fix `policy_map` always loading all policies from disk due to (not always needed) `global_vars` update. #22010

PavelCz commented Jan 31, 2022 •

edited

Loading

PavelCz Apr 25, 2022 •

edited

Loading