[RLlib] Refactor: All tf static graph code should reside inside Policy class. #17169

sven1977 · 2021-07-17T11:06:28Z

This PR contains a major refactor of the TFPolicy/DynamicTFPolicy and multi-GPU execution code:
In order to be able to efficiently add and remove TFPolicies on-the-fly in the future, we need to make sure that the only objects that hold references into the tf static-graph are our TFPolicies. Currently - when doing multi-GPU - the Trainer objects also contain parts of the graph via the different multi-GPU exec components (TFMultiGPUTrainer and TFMultiGPULearner).

This PR:

Moves the static-graph containing multi-GPU components into DynamicTFPolicy.
Moves the code for LocalSyncParallelOptimizer into DynamicTFPolicy and renames the class into TFMultiGPUTowerStack.
- Instances of TFMultiGPUTowerStack are held directly by the DynamicTFPolicy now. This will make it possible to cleanly remove graphs from memory (and close the per-policy sessions holding these graphs) in the future.
- TODO: A framework agnostic base-class (MultiGPUTowerStack) should be created to unify tf, tf-eager and torch.
Renames (for clarity):
- Config parameter: num_data_loader_buffers into num_multi_gpu_tower_stacks
- Class: TrainTFMultiGPU into MultiGPUTrainOneStep (vs TrainOneStep).
- Class: MultGPULearner into MultiGPULearnerThread (vs LearnerThread).
Makes sure that tower copies do not build any action computing/exploration parts of the graph. This functionality is not needed on towers.
Adds 2 new Policy API methods: load_batch_into_buffer and learn_on_loaded_batch to be used by the now framework-agnostic multi-GPU execution classes.
Adds an IMPALA fake multi-GPU learning test (w/ LSTM).
Adds tf to the existing multi-GPU SAC learning test.
Removes 1 unnecessary policy graph copy (for the total loss) from TFMultiGPUTowerStack.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ctor_tf_dynamic_policy_multi_gpu_optim

sven1977 added 13 commits July 12, 2021 11:09

wip.

9d3b27b

wip.

ae97ece

wip.

f78d196

Merge branch 'master' of https://github.com/ray-project/ray into refa…

accc23b

…ctor_tf_dynamic_policy_multi_gpu_optim

wip.

b64330c

Merge branch 'master' of https://github.com/ray-project/ray into refa…

a7aa1de

…ctor_tf_dynamic_policy_multi_gpu_optim

Merge branch 'master' of https://github.com/ray-project/ray into refa…

feafa83

…ctor_tf_dynamic_policy_multi_gpu_optim

wip.

0ac0970

wip.

bc34fb9

wip.

40e85ce

wip.

af61493

wip.

a3204e6

wip.

f97b393

sven1977 requested a review from michaelzhiluo July 17, 2021 11:06

sven1977 assigned michaelzhiluo Jul 17, 2021

sven1977 added 13 commits July 17, 2021 07:45

wip.

3edac65

wip.

b423e65

wip.

427f73d

Merge branch 'master' of https://github.com/ray-project/ray into refa…

18f04d0

…ctor_tf_dynamic_policy_multi_gpu_optim

fixes.

0a2de28

Merge branch 'master' of https://github.com/ray-project/ray into refa…

18f911b

…ctor_tf_dynamic_policy_multi_gpu_optim

Merge branch 'master' of https://github.com/ray-project/ray into refa…

bc5ac0e

…ctor_tf_dynamic_policy_multi_gpu_optim

wip

c78136e

wip

363aafe

wip

3d67210

fixes and LINT.

45675f9

fix.

a74e4a8

fix.

3fe297f

sven1977 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed tests-ok The tagger certifies test failures are unrelated and assumes personal liability. labels Jul 19, 2021

michaelzhiluo approved these changes Jul 19, 2021

View reviewed changes

sven1977 added 3 commits July 19, 2021 13:23

merge

1dcd890

Merge branch 'master' of https://github.com/ray-project/ray into refa…

4e2f0a9

…ctor_tf_dynamic_policy_multi_gpu_optim

fixes and lint.

4efd496

sven1977 merged commit 5a313ba into ray-project:master Jul 20, 2021

XuehaiPan mentioned this pull request Jul 28, 2021

[Rllib] Fix multi-GPU discovery for Torch/TF policies #17398

Closed

6 tasks

sven1977 deleted the refactor_tf_dynamic_policy_multi_gpu_optim branch June 2, 2023 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Refactor: All tf static graph code should reside inside Policy class. #17169

[RLlib] Refactor: All tf static graph code should reside inside Policy class. #17169

sven1977 commented Jul 17, 2021 •

edited

Loading

[RLlib] Refactor: All tf static graph code should reside inside Policy class. #17169

[RLlib] Refactor: All tf static graph code should reside inside Policy class. #17169

Conversation

sven1977 commented Jul 17, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 commented Jul 17, 2021 •

edited

Loading