Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. #22010

Merged
merged 9 commits into from
Apr 29, 2022

Conversation

PavelCz
Copy link
Contributor

@PavelCz PavelCz commented Jan 31, 2022

Why are these changes needed?

  • In the case that you use more policies than the policy_map_capacity, the in-memory cache should contain the frequently used policies and the less frequently used are stashed to disk
  • Due to the implementation of get_weights and foreach_trainable_policy, all policies are loaded from disk during training, which also makes the in-memory cache contain arbitrary policies, as opposed to only accessing policies that are needed.
  • My changes should be equivalent, with the difference that only items that are needed are accessed.
  • I am less sure whether the change to set_global_vars is equivalent, however it is necessary
    • During training timesteps is updated for all policies, causing all policies to be loaded from disk, making the cache useless. My change makes it so that only policies that are being trained need to be loaded from disk
  • Script to reproduce this problem can be found here: https://gist.github.com/PavelCz/f9356eb7a3b343099c3dfee0dfad0742

Related issue number

n/a

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@PavelCz PavelCz changed the title Fix policy map always loading all policies from disk [RLlib] Fix policy map always loading all policies from disk Jan 31, 2022
@stale
Copy link

stale bot commented Apr 21, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 21, 2022
@gjoliver
Copy link
Member

sorry to miss this PR until now.
I think although some policies are not trainable, they need to be there for the purpose of training those trainable policies.
think about some multi-agent cases where they are the opponents for example.
so this change may break those user cases?

@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 21, 2022
@PavelCz
Copy link
Contributor Author

PavelCz commented Apr 21, 2022

If those policies are necessary, they are still accessible to be loaded from disk. However, they should only be loaded when they are actually needed.
Current behavior is: All policies are always loaded from disk, whether they are used or not.
With my proposed changes: Trainable policies are always loaded from disk, if other policies are needed they will also be loaded from disk.
Essentially: Current behavior: If I want to access policy n, all policies will be loaded from disk, for no reason. New behavior, only load policy n. This does not change the behavior of RLlib loading policies that are needed.

Without this change having more policies than the policy_map_cache is unusable, as the disk will constantly be accessed.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, a couple of more questions.
thanks a lot for the contribution!

func(policy, pid, **kwargs)
for pid, policy in self.policy_map.items()
func(self.policy_map[pid], pid, **kwargs)
for pid in self.policy_map.keys()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really see the difference between current and updated version here.
can you say a few words about the intention here?
same below in get_weights() function.

Copy link
Contributor Author

@PavelCz PavelCz Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes from the view of the rollout_worker the behavior shouldn't change.
The difference is that iterating over items() will access both the key and the policy, i.e. internally PolicyMap will call __getitem__. __getitem__ will access the disk, if the selected policy is not in the cache. Every policy is accessed and then afterwards the if statement in the next line checks whether we are actually interested in this policy. With my changes we only call __getitem__ for those policies that pass through the if statement. This way we reduce the number of unnecessary access of policies and in turn reduce the number of unnecessary _read_from_disk() (Compare rllib/policy/policy_map.py).
The same goes for get_weights().

In short: Don't do policy_map[pid] if we skip this pid anyway, due to the if statement. This is important because policy_map[pid] might have disk reads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 this is some highly optimized stuff.
can you please help add a big comment above these 2 places, about what you just said, so we don't come in and accidentally "clean up" these loops some day.
appreciate the thoughtful changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understand! this is great.
I think global_vars are mostly for training purpose, so it should be fine to skip if a policy is not being trained.

@@ -1561,7 +1561,7 @@ def set_global_vars(self, global_vars: dict) -> None:
Examples:
>>> global_vars = worker.set_global_vars({"timestep": 4242})
"""
self.foreach_policy(lambda p, _: p.on_global_var_update(global_vars))
self.foreach_policy_to_train(lambda p, _: p.on_global_var_update(global_vars))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. so this PR doesn't really change what policies get loaded to the map, it only changes what policies get updated. is my understanding correct?
thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this change only changes what policy is updated with global vars.

The problem:

  • In order to update a policy, if it isn't in the memory cache it will need to be loaded from disk.
  • Looking through the project, it looks like every time set_global_vars() is called _get_global_vars is passed as argument
  • I'm not sure what kind of global vars there usually are, but looking into rllib/execution/common.py it looks like _get_global_vars() basically always returns an updated timestep
  • Consequently, if any policy is written to disk, these are read from disk all the time to update their timesteps and we have constant disk access again, slowing everything down, kind of defeating the reason why we put them to the disk in the first place.

The solution:

  • What at least helps with this problem is only updating global vars for policies that are being trained.
  • If all policies that are being trained fit into the in-memory cache, the disk won't constantly be accessed.
  • If they don't fit, then I see no way to stop constant disk access anyway.
  • I'm not 100% sure this doesn't have other consequences, someone with better knowledge of the code base would have to judge that. However, I see no reason why policies that are not being trained would need their vars updated (are there even other vars besides timesteps).
  • At any rate, without this case we are again at a point where any disk that is being offloaded to disk is constantly being accessed, making it basically unusable to have more policies than the policy_map_capacity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense! Thanks so much for digging into this problem here @PavelCz and all your help with the PR @gjoliver .
This is a really valuable fix. Merging now ...

@gjoliver
Copy link
Member

hey @PavelCz, one thing though, can you rebase to the latest master, so we can make sure all the tests pass?
you are on a very broken commit right now (our fault), so all the rl tests are broken.

@PavelCz
Copy link
Contributor Author

PavelCz commented Apr 25, 2022

I added comments to the two changes and also added a short comment to the third change. Let me know if those are ok.

@sven1977 sven1977 changed the title [RLlib] Fix policy map always loading all policies from disk [RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. Apr 29, 2022
@sven1977 sven1977 merged commit de0c6f6 into ray-project:master Apr 29, 2022
PavelCz added a commit to HumanCompatibleAI/ray that referenced this pull request Jun 6, 2022
… (not always needed) `global_vars` update. (ray-project#22010)

(cherry picked from commit de0c6f6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants