-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Add separate learning rates for policy and alpha
to SAC.
#47078
Merged
sven1977
merged 43 commits into
ray-project:master
from
simonsays1980:add-actor-specific-learning-rate
Aug 21, 2024
Merged
[RLlib] Add separate learning rates for policy and alpha
to SAC.
#47078
sven1977
merged 43 commits into
ray-project:master
from
simonsays1980:add-actor-specific-learning-rate
Aug 21, 2024
+135
−32
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…his imrpoves learning a bit. Signed-off-by: simonsays1980 <[email protected]>
simonsays1980
changed the title
[RLlib] - Add separate learnign rates for policy and
[RLlib] - Add separate learning rates for policy and Aug 12, 2024
alpha
to SAC.alpha
to SAC.
sven1977
changed the title
[RLlib] - Add separate learning rates for policy and
[RLlib] Add separate learning rates for policy and Aug 12, 2024
alpha
to SAC.alpha
to SAC.
sven1977
requested review from
sven1977 and
ArturNiederfahrenhorst
as code owners
August 12, 2024 16:48
sven1977
reviewed
Aug 12, 2024
rllib/algorithms/sac/sac.py
Outdated
@@ -82,6 +82,8 @@ def __init__(self, algo_class=None): | |||
"critic_learning_rate": 3e-4, | |||
"entropy_learning_rate": 3e-4, | |||
} | |||
self.policy_lr = 3e-5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
- Let's add a third property
self.critic_lr
and setself.lr
to None by default. This increases clarity and expressiveness. Otherwise, users (and us!) will forever have to open this sac.py file, just to quickly check, which one of the three lrs is the one covered by the defaultself.lr
, and which 2 have their own property. - Add to
validate()
a quick check for a) new stack and - if yes - b)self.lr
must be None, otherwise raise an informative error explaining that there are 3 different learning rates properties andself.lr
should NOT be used. - Switch
config.lr
vsconfig.critic_lr
in the respectiveSACLearner
methods.
…d set 'lr' to 'None' as requested in @sven1977's review. Furthermore, changed all examples and tuning scripts. Signed-off-by: simonsays1980 <[email protected]>
sven1977
reviewed
Aug 14, 2024
sven1977
reviewed
Aug 14, 2024
sven1977
reviewed
Aug 14, 2024
Signed-off-by: Sven Mika <[email protected]>
sven1977
approved these changes
Aug 14, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now! Thanks @simonsays1980 :)
…ses. (ray-project#47057) They were used to fetch / publish logs and errors, but now they are replaced by PythonGcsSubscriber cython binded classes. Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Ruiyang Wang <[email protected]>
… submitter (ray-project#47109) Signed-off-by: Jiajun Yao <[email protected]>
…t#47115) So these source files serving as dependency for doc files always get rebuilt correctly. --------- Signed-off-by: khluu <[email protected]>
) Split out `TestHTTPProxy` and `TestgRPCProxy` into a unit test file. Signed-off-by: Cindy Zhang <[email protected]>
…ay-project#47117) The current codebase includes `env_bool` and `env_integer` functions that directly convert environment variable strings into their respective types. To extend this functionality, we also need an `env_float` function to safely convert strings representing floating-point numbers into the `float` type." Signed-off-by: Hongpeng Guo <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> Fix a wrong variable name for a feature introduced in ray-project#46699, which caused progress bars to not show % progress / render the bar itself. After the changes in this PR, the progress bar shows % progress as desired: ![Screenshot at Aug 13 14-48-08](https://github.com/user-attachments/assets/f5fc5188-f33e-468c-a460-d3f115293e36) ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Scott Lee <[email protected]>
meaning it is tracking the latest version, so that we do not need to update the names of this one when we want to update the pyarrow version we are using. Signed-off-by: Lonnie Liu <[email protected]>
…ct#47121) Following up from ray-project#47082, we actually have 6 different data builds, with this matrix ``` python 3.9 python 3.12 arrow 6 X X arrow 17 X X arrow nightly X X ``` They all share the same build environment (https://github.com/ray-project/ray/blob/master/ci/docker/data.build.Dockerfile), but we have 6 configurations of these build environments given the above matrix This PR updates other flavors to use arrow 17 as well Test: - CI Signed-off-by: can <[email protected]>
…ide a task or actor Signed-off-by: Peter Nguyen <[email protected]>
…roject#47114) Add the rest of missing API references for rllib. We can also now enable the API policy lint checker for rllib, now that all missing references are documented Test: - CI <img width="1351" alt="Screenshot 2024-08-13 at 12 15 08 PM" src="https://github.com/user-attachments/assets/cc1d1c8e-763e-4d2e-a7d1-28243a7fdbab"> Signed-off-by: can <[email protected]>
simonsays1980
requested review from
scottjlee,
bveeramani,
stephanie-wang,
omatthew98,
a team,
richardliaw and
edoakes
as code owners
August 15, 2024 08:49
… constantly with TF in CI tests. This is old stack. Signed-off-by: simonsays1980 <[email protected]>
…brid stack is no longer supported. Signed-off-by: simonsays1980 <[email protected]>
…nently. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…and set 'lr' to 'None'. Furthermore, modified all learning rates to adapt to the number of learners. Signed-off-by: simonsays1980 <[email protected]>
… 'None' as needed for new stack SAC. Signed-off-by: simonsays1980 <[email protected]>
…ing rates in multi-agent SAC Pendulum tuned example to number of GPUs. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…Cheetah example. Signed-off-by: simonsays1980 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
It is commong practice to use a smaller learning rate for the policy than for the critic in SAC to
This PR proposes separate learning rates for policy, critic and
alpha
(the hyperparameter to guide entropy regularization over time). It does so by introducing two more arguments to theSACConfig.training
method, namely:policy_lr
alpha_lr
Furthermore, it adapts these learning rates in the tuned examples for SAC.
Note, this also enables different learning rates for CQL, which directly inherits from SAC. This should improve learning for both algorithms, SAC and CQL.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.