Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaussian Squashed Gaussian #7609

Closed
wants to merge 14 commits into from
Closed

Conversation

matthewearl
Copy link

@matthewearl matthewearl commented Mar 15, 2020

Why are these changes needed?

Currently when PPO is used with a bounded (continuous) action space, action samples are simply drawn from an unbounded normal distribution, and then clipped to the bounds. The entropy is calculated directly on the normal. Because PPO gives a reward for higher entropy, then there exists a failure mode where the algorithm can learn to push most of the mass outside of the action range and increase the variance, thus increasing entropy despite there being little change in selected actions.

The direct way to fix this, ie. calculating the entropy of the clipped distribution doesn't work since the clipped distribution actually has undefined entropy. Another way to fix it is to use a "soft clip" such as the existing SquashedGaussian distribution which maps samples through a (scaled) tanh function in order to ensure samples lie within the desired range. The problem here is that the entropy here is hard (impossible?) to compute analytically which is required by PPO when using a non-zero entropy_coeff.

In this PR I have implemented the GaussianSquashedGaussian which instead of mapping through tanh maps through the normal CDF. When scaled appropriately it closely approximates tanh:

image

However, it has the benefit that the entropy is analytically tractable. In fact, the entropy is just -KL(N1 || N2), where N1 is the normal being squashed, and N2 is the normal corresponding with the CDF used for squashing.

This should be considered a draft review for now, since I'd like to get a second opinion on how I've structured the catalog -> action space mapping, and I have also touched the existing SquashedGaussian and I'm unsure if these changes will break anything.

Related issue number

Checks

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23216/
Test PASSed.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@matthewearl
Copy link
Author

Is there anything else that I should do before this can be merged?

@sven1977
Copy link
Contributor

Hey @matthewearl Any update on this? If all tests pass, I'm happy to merge it. We could then add another specific test for GaussianSquashedGaussian (entropy).

@matthewearl
Copy link
Author

Hi @sven1977 . I noticed a stability issue when the mean deviates too far either side and almost all mass concentrates around either limit. I have a quick fix though, which is to clip the mean value returned from the net which practically should have little effect. I'll upload that shortly.

On the topic of SquashedGaussian, I hadn't realised this was already being used but I think it is now used by SAC? If so could my changes have broken it, since they also touch SquashedGaussian? Aside from these two points I'm happy to merge.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24681/
Test FAILed.

@matthewearl matthewearl marked this pull request as ready for review April 14, 2020 10:21
@matthewearl
Copy link
Author

I've just added a numerical stability fix which bounds the loc between -3 and 3. Given the scale bounds this should always represent a pretty extreme distribution with mass concentrated around either the high or low bound so it shouldn't limit the behaviour space too much.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24690/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24798/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24820/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24827/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24851/
Test PASSed.

@matthewearl
Copy link
Author

Hi @sven1977 I've added some unit tests, fixed linter issues, and fixed issues with the existing squashed gaussian test. I think the remaining issues from Travis are in the baseline (although I am not certain). Is there anything else required for getting this merged?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24982/
Test FAILed.

@ericl ericl closed this Sep 21, 2020
@janblumenkamp
Copy link
Contributor

Just wondering, why was this closed? As I see it, in the meantime a squashed gaussian has been added, but it seems to be only usable in SAC as it is not automatically chosen if bounds are given in the model catalog, correct?

@ericl ericl reopened this Sep 22, 2020
@matthewearl
Copy link
Author

Actually, SquashedGaussian predates this PR but as you point out it's limited to certain algos specifically because it doesn't implement KL or entropy methods

@janblumenkamp
Copy link
Contributor

Just realized that there is no Torch implementation - is that the reason why thas wasn't merged? I'd be happy to give it a try.

@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@stale
Copy link

stale bot commented Mar 2, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 2, 2022
@stale
Copy link

stale bot commented Apr 18, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants