[RLlib] Added benchmark experiment for SAC with MuJoCo, PPO with MuJoCo and DQN with Atari. #44262

simonsays1980 · 2024-03-25T13:56:21Z

Why are these changes needed?

This PR adds some benchmark runs for the following algorithms:

DQN: Atari & Atari with PB2
SAC: MuJoCo & MuJoCo with PB2 (HalfCheetah-v4, Hopper-v4, Humanoid-v4, Ant-v4, Walker2d-v4)
PPO: MuJoCo & MuJoCo with PB2

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Simon Zehnder <[email protected]>

… for all 5 environments and uses the best configuration to run a final trial. Signed-off-by: Simon Zehnder <[email protected]>

… for all 7 environments and uses the best configuration to run a final trial. In addition added a benchmark run using parameters from the paper. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…noisy encoders. Furthermore, finished benchmark script for Atari with DQN Rainbow. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

sven1977 · 2024-04-01T22:18:14Z

rllib/tuned_examples/dqn/benchmark_dqn_atari.py

+#   CleanRL: https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Mujoco--VmlldzoxODE0NjE
+#   AgileRL: https://github.com/AgileRL/AgileRL?tab=readme-ov-file#benchmarks
+
+benchmark_envs = {


Awesome PR @simonsays1980 ! Did we actually run these on RLlib and got these results? Or how did you get to these episode-return numbers?

Thanks ;) No we did not - I collected results from the papers. Take a look at my slack comment towards it, too.

sven1977 · 2024-04-01T22:20:48Z

rllib/tuned_examples/dqn/benchmark_dqn_atari.py

+        # Stop training if the mean reward is reached.
+        if (
+            result["sampler_results/episode_reward_mean"]
+            >= self.benchmark_envs[result["env"]]["sampler_results/episode_reward_mean"]


Had no idea "env" is always part of the results dict. ?

I think it is only when it is chosen by tunegird_search`

sven1977 · 2024-04-01T22:23:55Z

rllib/tuned_examples/dqn/benchmark_dqn_atari.py

+
+# See the following links for becnhmark results of other libraries:
+#   Original paper: https://arxiv.org/abs/1812.05905
+#   CleanRL: https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Mujoco--VmlldzoxODE0NjE


These are MuJoCo bechmarks. Are there respective results reported for CleanRL for Atari?

Yes there are - but not for all of the Atari games I think.

sven1977 · 2024-04-01T22:27:09Z

rllib/tuned_examples/dqn/benchmark_dqn_atari.py

@@ -0,0 +1,366 @@
+import gymnasium as gym
+from gymnasium.wrappers import AtariPreprocessing


Could we do a quick comparison of this wrapper (including the settings we use below, dim=84, noop=0, grayscale=True) vs our own Atari wrapper func in rllib.env.wrappers.atari_wrappers::wrap_atari_for_new_api_stack()?

RLlib nowadays (new API stack) uses a DreamerV3 style Atari setup:
dim=64
grayscale=True
noop=30
action_repeat_prob=0.0 (<- deterministic!)
small action space=True

I'd be totally happy with replacing our function with this gymnasium wrapper here, but I want to make sure we don't mess up our existing benchmarks.

We have to keep in mind that I took the benchmarks from the DQN Rainbow paper that is already some years old - there they use these settings. DreamerV3 is newer and therefore has taken - maybe - advantage of some developments there.

The ultimate question is: to what do we want to compare our algorithms?

I think we should just be bullish and:
a) Use our well-proven hyperparams and network architectures (e.g. the DreamerV3 CNN stack mentioned above, which has proven to be strong on Atari tasks with e.g. PPO).
b) The reason users use RLlib is not b/c it runs 20% faster or 2% slower than SB3 or another library. It's b/c it's scalable, multi-agent capable, multi-GPU capable, etc.. So as long as we show proper benchmark results (with one set of hyperparams or another) and maybe how scaling (adding more workers) affects these results, we should be solid.

c) If we do this, then we won't have the problem of always having to "look back" and backward-support these quite old settings (DQN paper 2015 -> almost 10 years old). We can freely find new and better parameters and scaling settings as long as they improve the performance.

I agree on your arguments. I added the PB2 run files to find such parameters and perform as best as possible with our setup and as discussed - we should run it with 1, 2, 4, x GPUs to show scalability.

In regard to the atari preprocessing. Let's make two single runs - one with our preprocessing and another with the gymnasium one and see which one performs better. It suffices to run for maybe 100-200 iterations imo.

@sven1977 Tbh I cannot find the settings for DreamerV3 using this wrap_atari_for_new_api_stack. The function also has no grayscale to be set. It uses a specific env_config:

# [2]: "We follow the evaluation protocol of Machado et al. (2018) with 200M # environment steps, action repeat of 4, a time limit of 108,000 steps per # episode that correspond to 30 minutes of game play, no access to life # information, full action space, and sticky actions. Because the world model # integrates information over time, DreamerV2 does not use frame stacking. # The experiments use a single-task setup where a separate agent is trained # for each game. Moreover, each agent uses only a single environment instance. env_config={ # "sticky actions" but not according to Danijar's 100k configs. "repeat_action_probability": 0.0, # "full action space" but not according to Danijar's 100k configs. "full_action_space": False, # Already done by MaxAndSkip wrapper: "action repeat" == 4. "frameskip": 1, }

sven1977 · 2024-04-02T18:32:45Z

rllib/tuned_examples/ppo/benchmark_ppo_mujoco.py

+        evaluation_num_workers=1,
+        evaluation_parallel_to_training=True,
+        evaluation_config={
+            "explore": True,


Should this be False?

Shouldn't PPO - having a stochastic policy - explore in evaluation to keep performance? I remember such discussions on the forum about the old stack PPO where users had this set to False and encountered a drop in performance.

sven1977 · 2024-04-05T12:58:09Z

Just a few nits before we can merge:

question on explore=True/False in eval config
use our own Atari wrapper in the Atari benchmarks

sven1977

Approved! Thanks @simonsays1980 . Can be merged pending comments/nits.

Signed-off-by: Simon Zehnder <[email protected]>

… added resources to all benchmark scripts. Fixed minor bug with Tune stopper. Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

simonsays1980 added 7 commits March 25, 2024 14:52

Added benchmark experiment for SAC with MuJoCo.

4dc610d

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into benchmarks-sac-mujoco-and-dqn-atari

0d56d6d

Signed-off-by: Simon Zehnder <[email protected]>

Added MuJoCo benchmark hyper-parameter search for SAC. Runs HP search…

e76f9ab

… for all 5 environments and uses the best configuration to run a final trial. Signed-off-by: Simon Zehnder <[email protected]>

Added MuJoCo benchmark hyper-parameter search for PPO. Runs HP search…

368623c

… for all 7 environments and uses the best configuration to run a final trial. In addition added a benchmark run using parameters from the paper. Signed-off-by: Simon Zehnder <[email protected]>

Added Atari benchmark run for DQN Rainbow.

1c83a87

Signed-off-by: Simon Zehnder <[email protected]>

Fixed minor bugs in DQN Rainbow module when using noisy heads but no …

ec6ebb5

…noisy encoders. Furthermore, finished benchmark script for Atari with DQN Rainbow. Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into benchmarks-sac-mujoco-and-dqn-atari

008aa2f

Signed-off-by: Simon Zehnder <[email protected]>

simonsays1980 self-assigned this Mar 27, 2024

sven1977 changed the title ~~Added benchmark experiment for SAC with MuJoCo.~~ [RLlib] Added benchmark experiment for SAC with MuJoCo. Apr 1, 2024

sven1977 marked this pull request as ready for review April 1, 2024 19:43

sven1977 requested review from sven1977, avnishn, ArturNiederfahrenhorst, maxpumperla and kouroshHakha as code owners April 1, 2024 19:43

sven1977 reviewed Apr 1, 2024

View reviewed changes

simonsays1980 changed the title ~~[RLlib] Added benchmark experiment for SAC with MuJoCo.~~ [RLlib] Added benchmark experiment for SAC with MuJoCo, PPO with MuJoCo and DQN with Atari. Apr 2, 2024

sven1977 reviewed Apr 2, 2024

View reviewed changes

sven1977 approved these changes Apr 5, 2024

View reviewed changes

simonsays1980 added 4 commits April 9, 2024 13:49

Merge branch 'master' into benchmarks-sac-mujoco-and-dqn-atari

88e9c53

Signed-off-by: Simon Zehnder <[email protected]>

Fixed minor bug causing CI linting tests to fail.

f74e93d

Signed-off-by: Simon Zehnder <[email protected]>

Excluded benchmark runs from doctest.

c1585c8

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into benchmarks-sac-mujoco-and-dqn-atari

60c1e5a

Signed-off-by: Simon Zehnder <[email protected]>

simonsays1980 added the rllib-newstack label Apr 10, 2024

simonsays1980 added 2 commits April 11, 2024 10:41

Added an extra file for comparison with RLlib atari preprocessing and…

b06c802

… added resources to all benchmark scripts. Fixed minor bug with Tune stopper. Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into benchmarks-sac-mujoco-and-dqn-atari

03eba07

Signed-off-by: Simon Zehnder <[email protected]>

simonsays1980 added 2 commits April 11, 2024 14:55

Merge branch 'master' into benchmarks-sac-mujoco-and-dqn-atari

737c5b8

Signed-off-by: Simon Zehnder <[email protected]>

LINTER.

5abe6f2

Signed-off-by: Simon Zehnder <[email protected]>

sven1977 merged commit 61ef56f into ray-project:master Apr 11, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Added benchmark experiment for SAC with MuJoCo, PPO with MuJoCo and DQN with Atari. #44262

[RLlib] Added benchmark experiment for SAC with MuJoCo, PPO with MuJoCo and DQN with Atari. #44262

simonsays1980 commented Mar 25, 2024 •

edited

Loading

sven1977 Apr 1, 2024

simonsays1980 Apr 2, 2024

sven1977 Apr 1, 2024

simonsays1980 Apr 2, 2024

sven1977 Apr 1, 2024

simonsays1980 Apr 2, 2024

sven1977 Apr 1, 2024

simonsays1980 Apr 2, 2024

sven1977 Apr 5, 2024

sven1977 Apr 5, 2024

simonsays1980 Apr 10, 2024

simonsays1980 Apr 10, 2024

simonsays1980 Apr 10, 2024

sven1977 Apr 2, 2024

simonsays1980 Apr 10, 2024

sven1977 commented Apr 5, 2024

sven1977 left a comment •

edited

Loading

		@@ -0,0 +1,366 @@
		import gymnasium as gym
		from gymnasium.wrappers import AtariPreprocessing

[RLlib] Added benchmark experiment for SAC with MuJoCo, PPO with MuJoCo and DQN with Atari. #44262

[RLlib] Added benchmark experiment for SAC with MuJoCo, PPO with MuJoCo and DQN with Atari. #44262

Conversation

simonsays1980 commented Mar 25, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Apr 5, 2024

sven1977 left a comment • edited Loading

Choose a reason for hiding this comment

simonsays1980 commented Mar 25, 2024 •

edited

Loading

sven1977 left a comment •

edited

Loading