Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add separate learning rates for policy and alpha to SAC. #47078

Merged

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Aug 12, 2024

Why are these changes needed?

It is commong practice to use a smaller learning rate for the policy than for the critic in SAC to

  • Ensure that the value function gives good approximations for policy improvement
  • To keep both learning processes at a similar pace (usually the critic can take larger values than the policy)

This PR proposes separate learning rates for policy, critic and alpha (the hyperparameter to guide entropy regularization over time). It does so by introducing two more arguments to the SACConfig.training method, namely:

  • policy_lr
  • alpha_lr

Furthermore, it adapts these learning rates in the tuned examples for SAC.

Note, this also enables different learning rates for CQL, which directly inherits from SAC. This should improve learning for both algorithms, SAC and CQL.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…his imrpoves learning a bit.

Signed-off-by: simonsays1980 <[email protected]>
@simonsays1980 simonsays1980 changed the title [RLlib] - Add separate learnign rates for policy and alpha to SAC. [RLlib] - Add separate learning rates for policy and alpha to SAC. Aug 12, 2024
@sven1977 sven1977 changed the title [RLlib] - Add separate learning rates for policy and alpha to SAC. [RLlib] Add separate learning rates for policy and alpha to SAC. Aug 12, 2024
@sven1977 sven1977 marked this pull request as ready for review August 12, 2024 16:48
@@ -82,6 +82,8 @@ def __init__(self, algo_class=None):
"critic_learning_rate": 3e-4,
"entropy_learning_rate": 3e-4,
}
self.policy_lr = 3e-5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions:

  1. Let's add a third property self.critic_lr and set self.lr to None by default. This increases clarity and expressiveness. Otherwise, users (and us!) will forever have to open this sac.py file, just to quickly check, which one of the three lrs is the one covered by the default self.lr, and which 2 have their own property.
  2. Add to validate() a quick check for a) new stack and - if yes - b) self.lr must be None, otherwise raise an informative error explaining that there are 3 different learning rates properties and self.lr should NOT be used.
  3. Switch config.lr vs config.critic_lr in the respective SACLearner methods.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now! Thanks @simonsays1980 :)

@sven1977 sven1977 enabled auto-merge (squash) August 14, 2024 10:09
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 14, 2024
simonsays1980 and others added 12 commits August 15, 2024 10:49
…ses. (ray-project#47057)

They were used to fetch / publish logs and errors, but now they are
replaced by PythonGcsSubscriber cython binded classes.

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
…t#47115)

So these source files serving as dependency for doc files always get
rebuilt correctly.

---------

Signed-off-by: khluu <[email protected]>
)

Split out `TestHTTPProxy` and `TestgRPCProxy` into a unit test file.

Signed-off-by: Cindy Zhang <[email protected]>
…ay-project#47117)

The current codebase includes `env_bool` and `env_integer` functions
that directly convert environment variable strings into their respective
types. To extend this functionality, we also need an `env_float`
function to safely convert strings representing floating-point numbers
into the `float` type."

Signed-off-by: Hongpeng Guo <[email protected]>
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
Fix a wrong variable name for a feature introduced in
ray-project#46699, which caused progress
bars to not show % progress / render the bar itself.

After the changes in this PR, the progress bar shows % progress as
desired:
![Screenshot at Aug 13
14-48-08](https://github.com/user-attachments/assets/f5fc5188-f33e-468c-a460-d3f115293e36)


## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Scott Lee <[email protected]>
meaning it is tracking the latest version, so that we do not need to
update the names of this one when we want to update the pyarrow version
we are using.

Signed-off-by: Lonnie Liu <[email protected]>
…ct#47121)

Following up from ray-project#47082, we
actually have 6 different data builds, with this matrix

```
                                  python 3.9           python 3.12
arrow 6                            X                            X   
arrow 17                           X                            X    
arrow nightly                      X                            X
```

They all share the same build environment
(https://github.com/ray-project/ray/blob/master/ci/docker/data.build.Dockerfile),
but we have 6 configurations of these build environments given the above
matrix

This PR updates other flavors to use arrow 17 as well

Test:
- CI

Signed-off-by: can <[email protected]>
…roject#47114)

Add the rest of missing API references for rllib. We can also now enable
the API policy lint checker for rllib, now that all missing references
are documented

Test:
- CI

<img width="1351" alt="Screenshot 2024-08-13 at 12 15 08 PM"
src="https://github.com/user-attachments/assets/cc1d1c8e-763e-4d2e-a7d1-28243a7fdbab">

Signed-off-by: can <[email protected]>
@aslonnie aslonnie removed request for a team, bveeramani and woshiyyya August 16, 2024 03:03
…brid stack is no longer supported.

Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…and set 'lr' to 'None'. Furthermore, modified all learning rates to adapt to the number of learners.

Signed-off-by: simonsays1980 <[email protected]>
… 'None' as needed for new stack SAC.

Signed-off-by: simonsays1980 <[email protected]>
…ing rates in multi-agent SAC Pendulum tuned example to number of GPUs.

Signed-off-by: simonsays1980 <[email protected]>
@sven1977 sven1977 enabled auto-merge (squash) August 20, 2024 18:21
@sven1977 sven1977 merged commit c50e3b6 into ray-project:master Aug 21, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.