[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). #17371

sven1977 · 2021-07-27T20:26:37Z

This PR makes all torch algos use the formerly TrainTFMultiGPU (now made framework agnostic and renamed into MultiGPUTrainOneStep).
This allows for pre-loading of batches into GPU memory and thus not having to re-load them every time we do an SGD iter.

First experiments suggest a ~33% speedup on a single GPU for PPO torch on Atari and a ~60+% speedup for 2GPUs)!

Benchmarks for measuring PPO torch speedups (1 GPU + 11 workers on Breakout-v4 on a p3dn.24xlarge machine; batch size 5k; lr=0.00005):

This PR:
== Status ==
Memory usage on this node: 17.4/747.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 11.0/96 CPUs, 1.0/8 GPUs, 0.0/550.03 GiB heap, 0.0/186.26 GiB objects (0.0/1.0 accelerator_type:V100)
Result logdir: /home/ray/ray_results/atari-ppo
Number of trials: 1/1 (1 RUNNING)
+----------------------------------------+----------+-------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+
| Trial name                             | status   | loc               |   iter |   total time (s) |     ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|----------------------------------------+----------+-------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------|
| PPO_BreakoutNoFrameskip-v4_47de3_00000 | RUNNING  | 172.31.80.77:1289 |     51 |          210.334 | 255000 |      1.1 |                    5 |                    0 |             723.76 |
+----------------------------------------+----------+-------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+


Ray 1.4
== Status ==
Memory usage on this node: 17.4/747.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 11.0/96 CPUs, 1.0/8 GPUs, 0.0/549.92 GiB heap, 0.0/186.26 GiB objects (0.0/1.0 accelerator_type:V100)
Result logdir: /home/ray/ray_results/atari-ppo
Number of trials: 1/1 (1 RUNNING)
+----------------------------------------+----------+-------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+
| Trial name                             | status   | loc               |   iter |   total time (s) |     ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|----------------------------------------+----------+-------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------|
| PPO_BreakoutNoFrameskip-v4_17f4f_00000 | RUNNING  | 172.31.80.77:8721 |     51 |          276.744 | 255000 |     3.49 |                   14 |                    0 |             999.73 |
+----------------------------------------+----------+-------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+

For 2 GPUs, the situation changed as follows (doubling batch size to 10k and lr to 0.0001):

This PR:

== Status ==
Memory usage on this node: 18.9/747.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 11.0/96 CPUs, 2.0/8 GPUs, 0.0/549.94 GiB heap, 0.0/186.26 GiB objects (0.0/1.0 accelerator_type:V100)
Result logdir: /home/ray/ray_results/atari-ppo
Number of trials: 1/1 (1 RUNNING)
+----------------------------------------+----------+--------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+
| Trial name                             | status   | loc                |   iter |   total time (s) |     ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|----------------------------------------+----------+--------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------|
| PPO_BreakoutNoFrameskip-v4_f3219_00000 | RUNNING  | 172.31.80.77:90505 |     51 |          318.461 | 510000 |     9.53 |                   21 |                    3 |            1703.92 |
+----------------------------------------+----------+--------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+


Ray 1.4:

== Status ==
Memory usage on this node: 18.8/747.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 11.0/96 CPUs, 2.0/8 GPUs, 0.0/549.89 GiB heap, 0.0/186.26 GiB objects (0.0/1.0 accelerator_type:V100)
Result logdir: /home/ray/ray_results/atari-ppo
Number of trials: 1/1 (1 RUNNING)
+----------------------------------------+----------+--------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+
| Trial name                             | status   | loc                |   iter |   total time (s) |     ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|----------------------------------------+----------+--------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------|
| PPO_BreakoutNoFrameskip-v4_ff193_00000 | RUNNING  | 172.31.80.77:26690 |     51 |          502.618 | 510000 |     8.35 |                   17 |                    2 |            1565.59 |
+----------------------------------------+----------+--------------------+--------+------------------+--------+----------+----------------------+----------------------+--------------------+

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ctor_tf_dynamic_policy_multi_gpu_optim

…h_uses_multi_gpu_train_one_step # Conflicts: # rllib/agents/trainer.py # rllib/utils/tf_ops.py

…h_uses_multi_gpu_train_one_step

sven1977 added 30 commits July 12, 2021 11:09

wip.

9d3b27b

wip.

ae97ece

wip.

f78d196

Merge branch 'master' of https://github.com/ray-project/ray into refa…

accc23b

…ctor_tf_dynamic_policy_multi_gpu_optim

wip.

b64330c

Merge branch 'master' of https://github.com/ray-project/ray into refa…

a7aa1de

…ctor_tf_dynamic_policy_multi_gpu_optim

Merge branch 'master' of https://github.com/ray-project/ray into refa…

feafa83

…ctor_tf_dynamic_policy_multi_gpu_optim

wip.

0ac0970

wip.

bc34fb9

wip.

40e85ce

wip.

af61493

wip.

a3204e6

wip.

f97b393

wip.

3edac65

wip.

b423e65

wip.

427f73d

Merge branch 'master' of https://github.com/ray-project/ray into refa…

18f04d0

…ctor_tf_dynamic_policy_multi_gpu_optim

fixes.

0a2de28

Merge branch 'master' of https://github.com/ray-project/ray into refa…

18f911b

…ctor_tf_dynamic_policy_multi_gpu_optim

Merge branch 'master' of https://github.com/ray-project/ray into refa…

bc5ac0e

…ctor_tf_dynamic_policy_multi_gpu_optim

wip

c78136e

wip

363aafe

wip

3d67210

fixes and LINT.

45675f9

wip.

a0853d2

Merge branch 'master' of https://github.com/ray-project/ray into torc…

6d14f0b

…h_uses_multi_gpu_train_one_step # Conflicts: # rllib/agents/trainer.py # rllib/utils/tf_ops.py

wip

926b45f

wip

51f6e86

Merge branch 'master' of https://github.com/ray-project/ray into torc…

f9ec33a

…h_uses_multi_gpu_train_one_step

wip

d3c5470

sven1977 added 2 commits July 27, 2021 14:39

Merge branch 'master' of https://github.com/ray-project/ray into torc…

a607703

…h_uses_multi_gpu_train_one_step

wip

5620d32

sven1977 requested a review from michaelzhiluo July 27, 2021 20:26

sven1977 assigned michaelzhiluo Jul 27, 2021

sven1977 added 4 commits July 27, 2021 20:19

wip

b79ca90

wip

8f32faf

Merge branch 'master' of https://github.com/ray-project/ray into torc…

00ceb55

…h_uses_multi_gpu_train_one_step

LINT

13b7de2

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 28, 2021

sven1977 changed the title ~~[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op.~~ [RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch). Jul 28, 2021

sven1977 changed the title ~~[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch).~~ [RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). Jul 28, 2021

sven1977 added 9 commits July 30, 2021 14:55

fixed prio. replay for SAC/DQN multi-GPU torch.

db7e169

wip

a1c800b

wip

bc68b12

wip

d23b8f6

Merge branch 'master' of https://github.com/ray-project/ray into torc…

65b113f

…h_uses_multi_gpu_train_one_step

Merge branch 'master' of https://github.com/ray-project/ray into torc…

8b658cd

…h_uses_multi_gpu_train_one_step

fix.

3853dfb

fix.

7d7fb3e

merge

69364c3

michaelzhiluo approved these changes Aug 3, 2021

View reviewed changes

sven1977 merged commit 924f11c into ray-project:master Aug 3, 2021

sven1977 deleted the torch_uses_multi_gpu_train_one_step branch June 2, 2023 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). #17371

[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). #17371

sven1977 commented Jul 27, 2021 •

edited

Loading

[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). #17371

[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). #17371

Conversation

sven1977 commented Jul 27, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 commented Jul 27, 2021 •

edited

Loading