Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Refactor: All tf static graph code should reside inside Policy class. #17169

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jul 17, 2021

This PR contains a major refactor of the TFPolicy/DynamicTFPolicy and multi-GPU execution code:
In order to be able to efficiently add and remove TFPolicies on-the-fly in the future, we need to make sure that the only objects that hold references into the tf static-graph are our TFPolicies. Currently - when doing multi-GPU - the Trainer objects also contain parts of the graph via the different multi-GPU exec components (TFMultiGPUTrainer and TFMultiGPULearner).

This PR:

  • Moves the static-graph containing multi-GPU components into DynamicTFPolicy.
  • Moves the code for LocalSyncParallelOptimizer into DynamicTFPolicy and renames the class into TFMultiGPUTowerStack.
    • Instances of TFMultiGPUTowerStack are held directly by the DynamicTFPolicy now. This will make it possible to cleanly remove graphs from memory (and close the per-policy sessions holding these graphs) in the future.
    • TODO: A framework agnostic base-class (MultiGPUTowerStack) should be created to unify tf, tf-eager and torch.
  • Renames (for clarity):
    • Config parameter: num_data_loader_buffers into num_multi_gpu_tower_stacks
    • Class: TrainTFMultiGPU into MultiGPUTrainOneStep (vs TrainOneStep).
    • Class: MultGPULearner into MultiGPULearnerThread (vs LearnerThread).
  • Makes sure that tower copies do not build any action computing/exploration parts of the graph. This functionality is not needed on towers.
  • Adds 2 new Policy API methods: load_batch_into_buffer and learn_on_loaded_batch to be used by the now framework-agnostic multi-GPU execution classes.
  • Adds an IMPALA fake multi-GPU learning test (w/ LSTM).
  • Adds tf to the existing multi-GPU SAC learning test.
  • Removes 1 unnecessary policy graph copy (for the total loss) from TFMultiGPUTowerStack.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed tests-ok The tagger certifies test failures are unrelated and assumes personal liability. labels Jul 19, 2021
@sven1977 sven1977 merged commit 5a313ba into ray-project:master Jul 20, 2021
@sven1977 sven1977 deleted the refactor_tf_dynamic_policy_multi_gpu_optim branch June 2, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants