[RLlib] Add separate learning rates for policy and `alpha` to SAC. #47078

simonsays1980 · 2024-08-12T14:30:33Z

Why are these changes needed?

It is commong practice to use a smaller learning rate for the policy than for the critic in SAC to

Ensure that the value function gives good approximations for policy improvement
To keep both learning processes at a similar pace (usually the critic can take larger values than the policy)

This PR proposes separate learning rates for policy, critic and alpha (the hyperparameter to guide entropy regularization over time). It does so by introducing two more arguments to the SACConfig.training method, namely:

policy_lr
alpha_lr

Furthermore, it adapts these learning rates in the tuned examples for SAC.

Note, this also enables different learning rates for CQL, which directly inherits from SAC. This should improve learning for both algorithms, SAC and CQL.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…his imrpoves learning a bit. Signed-off-by: simonsays1980 <[email protected]>

sven1977 · 2024-08-12T16:58:50Z

rllib/algorithms/sac/sac.py

@@ -82,6 +82,8 @@ def __init__(self, algo_class=None):
            "critic_learning_rate": 3e-4,
            "entropy_learning_rate": 3e-4,
        }
+        self.policy_lr = 3e-5


Suggestions:

Let's add a third property self.critic_lr and set self.lr to None by default. This increases clarity and expressiveness. Otherwise, users (and us!) will forever have to open this sac.py file, just to quickly check, which one of the three lrs is the one covered by the default self.lr, and which 2 have their own property.

Add to validate() a quick check for a) new stack and - if yes - b) self.lr must be None, otherwise raise an informative error explaining that there are 3 different learning rates properties and self.lr should NOT be used.

Switch config.lr vs config.critic_lr in the respective SACLearner methods.

@sven1977

…d set 'lr' to 'None' as requested in @sven1977's review. Furthermore, changed all examples and tuning scripts. Signed-off-by: simonsays1980 <[email protected]>

rllib/tuned_examples/sac/benchmark_sac_mujoco.py

rllib/tuned_examples/sac/multi_agent_pendulum_sac.py

rllib/algorithms/sac/sac.py

Signed-off-by: Sven Mika <[email protected]>

sven1977

LGTM now! Thanks @simonsays1980 :)

…#47105)

…ses. (ray-project#47057) They were used to fetch / publish logs and errors, but now they are replaced by PythonGcsSubscriber cython binded classes. Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Ruiyang Wang <[email protected]>

… submitter (ray-project#47109) Signed-off-by: Jiajun Yao <[email protected]>

…t#47115) So these source files serving as dependency for doc files always get rebuilt correctly. --------- Signed-off-by: khluu <[email protected]>

) Split out `TestHTTPProxy` and `TestgRPCProxy` into a unit test file. Signed-off-by: Cindy Zhang <[email protected]>

…ay-project#47117) The current codebase includes `env_bool` and `env_integer` functions that directly convert environment variable strings into their respective types. To extend this functionality, we also need an `env_float` function to safely convert strings representing floating-point numbers into the `float` type." Signed-off-by: Hongpeng Guo <[email protected]>

## Why are these changes needed?  Fix a wrong variable name for a feature introduced in ray-project#46699, which caused progress bars to not show % progress / render the bar itself. After the changes in this PR, the progress bar shows % progress as desired: ![Screenshot at Aug 13 14-48-08](https://github.com/user-attachments/assets/f5fc5188-f33e-468c-a460-d3f115293e36) ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Scott Lee <[email protected]>

meaning it is tracking the latest version, so that we do not need to update the names of this one when we want to update the pyarrow version we are using. Signed-off-by: Lonnie Liu <[email protected]>

…ct#47121) Following up from ray-project#47082, we actually have 6 different data builds, with this matrix ``` python 3.9 python 3.12 arrow 6 X X arrow 17 X X arrow nightly X X ``` They all share the same build environment (https://github.com/ray-project/ray/blob/master/ci/docker/data.build.Dockerfile), but we have 6 configurations of these build environments given the above matrix This PR updates other flavors to use arrow 17 as well Test: - CI Signed-off-by: can <[email protected]>

…ide a task or actor Signed-off-by: Peter Nguyen <[email protected]>

…roject#47114) Add the rest of missing API references for rllib. We can also now enable the API policy lint checker for rllib, now that all missing references are documented Test: - CI <img width="1351" alt="Screenshot 2024-08-13 at 12 15 08 PM" src="https://github.com/user-attachments/assets/cc1d1c8e-763e-4d2e-a7d1-28243a7fdbab"> Signed-off-by: can <[email protected]>

… constantly with TF in CI tests. This is old stack. Signed-off-by: simonsays1980 <[email protected]>

…brid stack is no longer supported. Signed-off-by: simonsays1980 <[email protected]>

…nently. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…and set 'lr' to 'None'. Furthermore, modified all learning rates to adapt to the number of learners. Signed-off-by: simonsays1980 <[email protected]>

… 'None' as needed for new stack SAC. Signed-off-by: simonsays1980 <[email protected]>

…ing rates in multi-agent SAC Pendulum tuned example to number of GPUs. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…Cheetah example. Signed-off-by: simonsays1980 <[email protected]>

Added spearate learnign rates for policy, critic, and alpha to SAC. T…

ffe5048

…his imrpoves learning a bit. Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 changed the title ~~[RLlib] - Add separate learnign rates for policy and alpha to SAC.~~ [RLlib] - Add separate learning rates for policy and alpha to SAC. Aug 12, 2024

sven1977 changed the title ~~[RLlib] - Add separate learning rates for policy and alpha to SAC.~~ [RLlib] Add separate learning rates for policy and alpha to SAC. Aug 12, 2024

sven1977 marked this pull request as ready for review August 12, 2024 16:48

sven1977 requested review from sven1977 and ArturNiederfahrenhorst as code owners August 12, 2024 16:48

Merge branch 'master' into add-actor-specific-learning-rate

3fc16cf

sven1977 reviewed Aug 12, 2024

View reviewed changes

simonsays1980 added 3 commits August 13, 2024 11:12

Added an additional 'ciritc_lr', change 'policy_lr' to 'actor_lr', an…

cd5450c

…d set 'lr' to 'None' as requested in @sven1977's review. Furthermore, changed all examples and tuning scripts. Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into add-actor-specific-learning-rate

38cac91

Merge branch 'master' into add-actor-specific-learning-rate

8f64c7c

sven1977 reviewed Aug 14, 2024

View reviewed changes

rllib/tuned_examples/sac/benchmark_sac_mujoco.py Outdated Show resolved Hide resolved

sven1977 reviewed Aug 14, 2024

View reviewed changes

rllib/tuned_examples/sac/multi_agent_pendulum_sac.py Outdated Show resolved Hide resolved

sven1977 reviewed Aug 14, 2024

View reviewed changes

rllib/algorithms/sac/sac.py Outdated Show resolved Hide resolved

Apply suggestions from code review

15e2898

Signed-off-by: Sven Mika <[email protected]>

sven1977 approved these changes Aug 14, 2024

View reviewed changes

sven1977 enabled auto-merge (squash) August 14, 2024 10:09

github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 14, 2024

simonsays1980 and others added 12 commits August 15, 2024 10:49

[RLlib; Offline RL] Implement twin-Q net option for CQL. (ray-project…

d0679b7

…#47105)

[Core] Fix a bug where we submit the actor creation task to the wrong…

fc0f1fe

… submitter (ray-project#47109) Signed-off-by: Jiajun Yao <[email protected]>

[doc][build] Update all changed files timestamp to latest (ray-projec…

387a083

…t#47115) So these source files serving as dependency for doc files always get rebuilt correctly. --------- Signed-off-by: khluu <[email protected]>

[serve] split test_proxy.py into unit and e2e tests (ray-project#47112

326eaae

) Split out `TestHTTPProxy` and `TestgRPCProxy` into a unit test file. Signed-off-by: Cindy Zhang <[email protected]>

[data] change data17 to datal (ray-project#47082)

ca98c7f

meaning it is tracking the latest version, so that we do not need to update the names of this one when we want to update the pyarrow version we are using. Signed-off-by: Lonnie Liu <[email protected]>

[doc][rllib] add missing public api references (ray-project#47111)

cbaad59

[Core] Clarify docstring for get_gpu_ids() that it is only called ins…

ce283ad

…ide a task or actor Signed-off-by: Peter Nguyen <[email protected]>

simonsays1980 requested review from scottjlee, bveeramani, stephanie-wang, omatthew98, a team, richardliaw and edoakes as code owners August 15, 2024 08:49

github-actions bot disabled auto-merge August 15, 2024 08:49

simonsays1980 added 3 commits August 15, 2024 12:05

Merge branch 'master' into add-actor-specific-learning-rate

8e6f9cc

Merge branch 'master' into add-actor-specific-learning-rate

8d98825

Turned off test 'self_play_with_policy_checkpoint' b/c it was failing…

6808cbb

… constantly with TF in CI tests. This is old stack. Signed-off-by: simonsays1980 <[email protected]>

aslonnie removed request for a team, bveeramani and woshiyyya August 16, 2024 03:03

simonsays1980 added 13 commits August 16, 2024 17:47

Uncommented 'pretrain_bc_single_agent_evaluate_as_multi_agent' b/c hy…

3a75d15

…brid stack is no longer supported. Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into add-actor-specific-learning-rate

f1dde3e

Merge branch 'master' into add-actor-specific-learning-rate

39f90a6

Switched test for old stack CQL to 'torch-only' b/c 'tf2' fails perma…

79f34ff

…nently. Signed-off-by: simonsays1980 <[email protected]>

Fixed a small bug with uninitialized learning rates on old stack SAC.

3944b31

Signed-off-by: simonsays1980 <[email protected]>

Merged master.

0a7aaa5

Signed-off-by: simonsays1980 <[email protected]>

Added actor- and critic-specific learning rates to HalfCheetah tests …

3d670c5

…and set 'lr' to 'None'. Furthermore, modified all learning rates to adapt to the number of learners. Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into add-actor-specific-learning-rate

380e8e6

Fixed error in 'test_worker_failures' due to the base 'lr' not set to…

179fce4

… 'None' as needed for new stack SAC. Signed-off-by: simonsays1980 <[email protected]>

Fixed error in doc codes not implementing 'lr=None' and adapted learn…

d6d4d5a

…ing rates in multi-agent SAC Pendulum tuned example to number of GPUs. Signed-off-by: simonsays1980 <[email protected]>

Tuned learning rates on multi-agent SAC example.

44336ae

Signed-off-by: simonsays1980 <[email protected]>

Merge branch 'master' into add-actor-specific-learning-rate

2dbb93d

Added tuned learning rates to single agent SAC tuned example and Half…

351b0a8

…Cheetah example. Signed-off-by: simonsays1980 <[email protected]>

sven1977 enabled auto-merge (squash) August 20, 2024 18:21

sven1977 merged commit c50e3b6 into ray-project:master Aug 21, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add separate learning rates for policy and `alpha` to SAC. #47078

[RLlib] Add separate learning rates for policy and `alpha` to SAC. #47078

simonsays1980 commented Aug 12, 2024 •

edited

Loading

sven1977 Aug 12, 2024

sven1977 left a comment

[RLlib] Add separate learning rates for policy and alpha to SAC. #47078

[RLlib] Add separate learning rates for policy and alpha to SAC. #47078

Conversation

simonsays1980 commented Aug 12, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 Aug 12, 2024

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

[RLlib] Add separate learning rates for policy and `alpha` to SAC. #47078

[RLlib] Add separate learning rates for policy and `alpha` to SAC. #47078

simonsays1980 commented Aug 12, 2024 •

edited

Loading