[RLlib] Fix SAC/DQN/CQL GPU and multi-GPU. #47179

sven1977 · 2024-08-16T19:21:26Z

Fix DQN/SAC/CQL GPU and multi-GPU.

Fix bug in DQN/SAC/CQL where the target net do not get updated due to missing timestep information on the Learner side (only happens when we have 1 or more remote Learners).
For torch DDP to work with a complex setup like SAC's (where the same network (q-net) is passed twice, but one of these passes should NOT record gradients): Implement "straight-through" gradients for the q-net forward pass with the resampled actions (computed by the policy net). In other words, make sure that for this forward pass, the q-net does NOT get its gradients recorded, but the policy net does.
Add CI learning tests for combinations of [DQN | SAC] x [single-agent | multi-agent] x [CPU Learner | GPU Learner | 2 CPU Learners | 2 GPU Learners].
Add release test for SAC on HalfCheetah-v4.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

simonsays1980

LGTM. Great PR with a big achievement. Multi-GPU on SAC is awesome!

simonsays1980 · 2024-08-19T09:12:01Z

rllib/BUILD

+    tags = ["team:rllib", "exclusive", "learning_tests", "torch_only", "learning_tests_discrete", "learning_tests_pytorch_use_all_core", "gpu"],
+    size = "large",
+    srcs = ["tuned_examples/dqn/cartpole_dqn.py"],
+    args = ["--as-test", "--enable-new-api-stack", "--num-gpus=1"]


Does num-gpus=1 use a local or remote learner? Imo, we should test with both. What do you think @sven1977 ?

For IMPALA/APPO, we should add a validation that these should never be run with a local Learner, b/c these are async algos that suffer tremendously from having the Learner not-async. Will add this check/error in a separate PR ...

simonsays1980 · 2024-08-19T09:13:05Z

rllib/BUILD

+    tags = ["team:rllib", "exclusive", "learning_tests", "torch_only", "learning_tests_discrete", "learning_tests_pytorch_use_all_core", "gpu"],
+    size = "large",
+    srcs = ["tuned_examples/dqn/multi_agent_cartpole_dqn.py"],
+    args = ["--as-test", "--enable-new-api-stack", "--num-agents=2", "--num-cpus=4", "--num-gpus=1"]


Interesting, I thought this does not work --num-gpus > 0 and --num-cpus > 0 :)

Good point. We need to get rid of this confusion some time soon. Note that these are the command line options, not directly translatable to Algo config properties:
Here:
--num-cpus are the ray provided CPUs for the entire cluster.
--num-gpus are the number of Learner workers; note that if no GPUs are available, --num-gpus still sets the number of Learner workers, but then each worker gets one CPU (instead of 1 GPU). :|

simonsays1980 · 2024-08-19T09:15:14Z

rllib/BUILD

+    main = "tuned_examples/sac/multi_agent_pendulum_sac.py",
+    tags = ["team:rllib", "exclusive", "learning_tests", "torch_only", "learning_tests_continuous"],
+    size = "large",
+    srcs = ["tuned_examples/sac/multi_agent_pendulum_sac.py"],


Do we actually need the srcs for files that can be executed directly via python?

simonsays1980 · 2024-08-19T09:20:30Z

rllib/algorithms/dqn/dqn.py

            # Reduce EnvRunner metrics over the n EnvRunners.
            self.metrics.merge_and_log_n_dicts(
                env_runner_results, key=ENV_RUNNER_RESULTS
            )

+            # Add the sampled experiences to the replay buffer.
+            with self.metrics.log_time((TIMERS, REPLAY_BUFFER_ADD_DATA_TIMER)):


simonsays1980 · 2024-08-19T09:29:01Z

rllib/algorithms/sac/torch/sac_torch_rl_module.py

+        # here). This is different from doing `.detach()` or `with torch.no_grads()`,
+        # as these two methds would fully block all gradient recordings, including
+        # the needed policy ones.
+        all_params = (


Signed-off-by: sven1977 <[email protected]>

sven1977 added 4 commits August 16, 2024 13:00

wip

9a7df3d

Signed-off-by: sven1977 <[email protected]>

wip

f0419de

Signed-off-by: sven1977 <[email protected]>

wip

b66780c

Signed-off-by: sven1977 <[email protected]>

wip

35849c5

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners August 16, 2024 19:21

sven1977 enabled auto-merge (squash) August 16, 2024 19:21

sven1977 assigned simonsays1980 Aug 16, 2024

github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 16, 2024

wip

ab11b56

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from maxpumperla and a team as code owners August 16, 2024 19:23

github-actions bot disabled auto-merge August 16, 2024 19:24

sven1977 added 4 commits August 16, 2024 21:32

wip

aa3909b

Signed-off-by: sven1977 <[email protected]>

wip

5468711

Signed-off-by: sven1977 <[email protected]>

wip

2f60378

Signed-off-by: sven1977 <[email protected]>

wip

ae4d29d

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) August 16, 2024 20:04

sven1977 added 2 commits August 16, 2024 22:34

wip

0038a22

Signed-off-by: sven1977 <[email protected]>

wip

67a42ee

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge August 17, 2024 05:00

wip

fb998f2

Signed-off-by: sven1977 <[email protected]>

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Aug 17, 2024

sven1977 enabled auto-merge (squash) August 17, 2024 07:57

wip

ce88849

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge August 17, 2024 10:32

sven1977 added 3 commits August 17, 2024 12:33

wip

9865ef1

Signed-off-by: sven1977 <[email protected]>

wip

bd090c6

Signed-off-by: sven1977 <[email protected]>

wip

79b1f8f

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) August 18, 2024 17:41

wip

59605bf

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge August 18, 2024 17:45

sven1977 added 3 commits August 18, 2024 20:16

wip

cb842d9

Signed-off-by: sven1977 <[email protected]>

wip

7e98ba3

Signed-off-by: sven1977 <[email protected]>

wip

3040e03

Signed-off-by: sven1977 <[email protected]>

simonsays1980 approved these changes Aug 19, 2024

View reviewed changes

simonsays1980 mentioned this pull request Aug 19, 2024

[RLlib; testing] - Added tests for different combinations of learners, agents and gpus. #47175

Closed

8 tasks

wip

0e083be

Signed-off-by: sven1977 <[email protected]>

sven1977 merged commit b040318 into ray-project:master Aug 19, 2024
5 checks passed

sven1977 deleted the fix_sac_cql_multi_gpu branch August 19, 2024 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix SAC/DQN/CQL GPU and multi-GPU. #47179

[RLlib] Fix SAC/DQN/CQL GPU and multi-GPU. #47179

sven1977 commented Aug 16, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Aug 19, 2024

sven1977 Aug 19, 2024

simonsays1980 Aug 19, 2024

sven1977 Aug 19, 2024

simonsays1980 Aug 19, 2024

simonsays1980 Aug 19, 2024

simonsays1980 Aug 19, 2024

[RLlib] Fix SAC/DQN/CQL GPU and multi-GPU. #47179

[RLlib] Fix SAC/DQN/CQL GPU and multi-GPU. #47179

Conversation

sven1977 commented Aug 16, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

sven1977 Aug 19, 2024

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

sven1977 Aug 19, 2024

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

sven1977 commented Aug 16, 2024 •

edited

Loading