[RLlib] DQN Rainbow on new API stack (w/ EnvRunner):`training_step` implementation. #43198

simonsays1980 · 2024-02-15T14:00:24Z

Why are these changes needed?

We are moving the standard algorithms to our new stack (i.e. RLModule API and EnvRunner API). This PR is one part of moving DQN Rainbow into our new stack. With it comes a training step that enables using the EnvRunner API together with RLModule.

See #43196 for the corresponding learners for DQN Rainbow.

Related issue number

Closes #37777

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…PI'. Furthermore, changed a couple of configurations to make DQN Rainbow run with the new stack (spec. 'RLModule'). In addition I amde minor changes to prioritized replay buffer. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…and implemented the latter into the training steps for PPO and SAC when using the new 'EnvRunner API'. Signed-off-by: Simon Zehnder <[email protected]>

sven1977 · 2024-02-20T13:29:58Z

rllib/algorithms/simple_q/simple_q.py

@@ -276,7 +276,12 @@ def validate(self) -> None:
        # Call super's validation method.
        super().validate()

-        if self.exploration_config["type"] == "ParameterNoise":
+        # TODO (simon): Find a clean solution to deal with


Just a note: We were going to move SimpleQ into rllib_contrib, but never did b/c of some remaining dependencies. But we will not move this one into the new stack anyways.

Alright - I was just referring to the problem with the new stack when exploration does take place via a certain exploration strategy and not via an action distribution. In DQN we have to deal with this to get somehow the epsilon into the RLModule

sven1977 · 2024-02-20T13:30:37Z

rllib/tuned_examples/dqn/cartpole_dqn_envrunner.py

@@ -0,0 +1,34 @@
+from ray.rllib.algorithms.dqn import DQNConfig


Awesomeness!! Do we have some results already vs the old stack?

Nope, not yet. There were remaining some nits here and there. I was just making sure that the algorithm runs - and it does :)

sven1977 · 2024-02-20T13:33:24Z

rllib/algorithms/dqn/dqn.py

@@ -307,13 +341,59 @@ def validate(self) -> None:
                    " used at the same time!"
                )

+        # Validate that we use the corresponding `EpisodeReplayBuffer` when using


Can we add a TODO here (and also in SAC in the respective line) that we need to implement a MultiAgentEpisodeReplayBuffer to enable SAC/DQN for multi-agent cases?

Let's do this in an extra PR. I already made an issue for this some time ago: #42872

rllib/algorithms/dqn/dqn.py

sven1977 · 2024-02-20T13:35:10Z

rllib/algorithms/dqn/dqn.py

@@ -374,6 +454,141 @@ def training_step(self) -> ResultDict:
        Returns:
            The results dict from executing the training iteration.
        """
+        # New API stack (RLModule, Learner, EnvRunner, ConnectorV2).


rllib/algorithms/dqn/dqn.py

Signed-off-by: Sven Mika <[email protected]>

rllib/algorithms/dqn/dqn.py

sven1977 · 2024-02-20T13:38:49Z

rllib/algorithms/dqn/dqn.py

+        # Run multiple sampling iterations.
+        for _ in range(store_weight):
+            with self._timers[SAMPLE_TIMER]:
+                # TODO (simon): Use `sychnronous_parallel_sample()` here.


We continue kicking this can down the road :D
Can we do this in this PR, fix the sychnronous_parallel_sample function? I think it's really just a few lines that would have to be changed in there to make it work with episodes.

Then we can also - in this same PR - fix SAC and PPO for good and remove all these TODOs.

One merge away :)

rllib/algorithms/dqn/dqn.py

sven1977

The speed with which you crank out these algo implementations on the new stack is breathtaking :) Great work.

A few comment-related nits.
One bigger item still to complete: Could we fix the synchronous sample utility to work on the new stack in this PR? This would close all these open TODOs for good.

Signed-off-by: Simon Zehnder <[email protected]>

…erroring out when the list was empty and key '0' not available. Made multiple tests with SAC and PPO, whcih learn both now. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…t only for 'num_atoms=1'. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…rainbow-training-step

Signed-off-by: sven1977 <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…ray into dqn-rainbow-training-step Signed-off-by: Simon Zehnder <[email protected]>

…ouble_q' b/c backpropagation does not work ortherwise. Furthermore, adapted tuned examples for new to old stack. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…rioritizedEpisodeReplayBuffer' and run some experiments with new stack against old stack. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…t_encoder_config') and 'TorchDQNRainbowModule' for the double_q case (outputs need to be chunkled). Signed-off-by: Simon Zehnder <[email protected]>

…rn. Added updating the 'global_num_env_steps_sampled' in the 'SingleAgentEnvRunner' to avoid synching after each sampling loop. Tested on 'FrozenLake-v1' all combinations are running, but 'noisy=True' and 'double_q=True' have not been tested, yet. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…ng step using noisy networks, dueling, double-Q, and distributional learning. SOme performance improvements were made and in case noisy netwokrs are used no epsilon greedy is used. A build test was added to use this new stack together with the 'SingleAgentEnvRunner' Signed-off-by: Simon Zehnder <[email protected]>

…NRainbowRLModule' and added docstrings. Signed-off-by: Simon Zehnder <[email protected]>

rllib/BUILD

sven1977 · 2024-03-22T19:47:16Z

rllib/tuned_examples/dqn/cartpole_dqn_envrunner.py

+)
+
+stop = {
+    "evaluation/sampler_results/episode_reward_mean": 500.0,


Oh, wow! This is very good!

sven1977 · 2024-03-22T19:48:04Z

rllib/env/single_agent_env_runner.py

@@ -255,7 +255,7 @@ def _sample_timesteps(
                # RLModule forward pass: Explore or not.
                if explore:
                    to_env = self.module.forward_exploration(
-                        to_module, t=self.global_num_env_steps_sampled + ts
+                        to_module, t=self.global_num_env_steps_sampled


Perfect! Thanks for fixing this logic. Now each EnvRunner is more robust in itself, keeping these up to date, at least until a new (and better) global count arrives.

Signed-off-by: Sven Mika <[email protected]>

sven1977

LGTM! Thanks for the PR @simonsays1980 !

sven1977 · 2024-03-22T19:49:09Z

Waiting for tests ... then merge.

…mplementation. (ray-project#43198)

simonsays1980 mentioned this pull request Feb 15, 2024

[RLlib] DQN Rainbow on new API stack: RLModule and Catalog together with TorchNoisyMLP. #43199

Merged

8 tasks

simonsays1980 added 2 commits February 16, 2024 12:56

Added tuned example file for DQN Rainbow with EnvRunner API.

c715ced

Signed-off-by: Simon Zehnder <[email protected]>

Added functionality for 'EpisodeV3' to 'synchronous_parallel_sample' …

8d96ca4

…and implemented the latter into the training steps for PPO and SAC when using the new 'EnvRunner API'. Signed-off-by: Simon Zehnder <[email protected]>

sven1977 marked this pull request as ready for review February 20, 2024 13:26

sven1977 changed the title ~~DQN Rainbow training_step with new stack and EnvRunner API.~~ [RLlib] DQN Rainbow on new API stack (w/ EnvRunner):training_step implementation. Feb 20, 2024

sven1977 requested review from sven1977, avnishn, ArturNiederfahrenhorst, maxpumperla and kouroshHakha as code owners February 20, 2024 13:26

sven1977 self-assigned this Feb 20, 2024

Merge branch 'master' into dqn-rainbow-training-step

496c15d