Gaussian Squashed Gaussian #7609

matthewearl · 2020-03-15T16:36:07Z

Why are these changes needed?

Currently when PPO is used with a bounded (continuous) action space, action samples are simply drawn from an unbounded normal distribution, and then clipped to the bounds. The entropy is calculated directly on the normal. Because PPO gives a reward for higher entropy, then there exists a failure mode where the algorithm can learn to push most of the mass outside of the action range and increase the variance, thus increasing entropy despite there being little change in selected actions.

The direct way to fix this, ie. calculating the entropy of the clipped distribution doesn't work since the clipped distribution actually has undefined entropy. Another way to fix it is to use a "soft clip" such as the existing SquashedGaussian distribution which maps samples through a (scaled) tanh function in order to ensure samples lie within the desired range. The problem here is that the entropy here is hard (impossible?) to compute analytically which is required by PPO when using a non-zero entropy_coeff.

In this PR I have implemented the GaussianSquashedGaussian which instead of mapping through tanh maps through the normal CDF. When scaled appropriately it closely approximates tanh:

However, it has the benefit that the entropy is analytically tractable. In fact, the entropy is just -KL(N1 || N2), where N1 is the normal being squashed, and N2 is the normal corresponding with the CDF used for squashing.

This should be considered a draft review for now, since I'd like to get a second opinion on how I've structured the catalog -> action space mapping, and I have also touched the existing SquashedGaussian and I'm unsure if these changes will break anything.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested (please justify below)

Still some bugs to fix

AmplabJenkins · 2020-03-15T16:37:42Z

Can one of the admins verify this patch?

AmplabJenkins · 2020-03-15T17:53:24Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23216/
Test PASSed.

rllib/models/catalog.py

sven1977

Nice!

matthewearl · 2020-03-16T16:41:21Z

Is there anything else that I should do before this can be merged?

sven1977 · 2020-03-24T15:51:33Z

Hey @matthewearl Any update on this? If all tests pass, I'm happy to merge it. We could then add another specific test for GaussianSquashedGaussian (entropy).

matthewearl · 2020-03-26T15:06:57Z

Hi @sven1977 . I noticed a stability issue when the mean deviates too far either side and almost all mass concentrates around either limit. I have a quick fix though, which is to clip the mean value returned from the net which practically should have little effect. I'll upload that shortly.

On the topic of SquashedGaussian, I hadn't realised this was already being used but I think it is now used by SAC? If so could my changes have broken it, since they also touch SquashedGaussian? Aside from these two points I'm happy to merge.

rllib/models/catalog.py

AmplabJenkins · 2020-04-14T07:28:58Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24681/
Test FAILed.

matthewearl · 2020-04-14T10:22:59Z

I've just added a numerical stability fix which bounds the loc between -3 and 3. Given the scale bounds this should always represent a pretty extreme distribution with mass concentrated around either the high or low bound so it shouldn't limit the behaviour space too much.

AmplabJenkins · 2020-04-14T11:11:36Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24690/
Test FAILed.

AmplabJenkins · 2020-04-16T17:18:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24798/
Test FAILed.

AmplabJenkins · 2020-04-16T22:19:13Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24820/
Test FAILed.

AmplabJenkins · 2020-04-16T23:16:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24827/
Test FAILed.

AmplabJenkins · 2020-04-17T10:22:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24851/
Test PASSed.

matthewearl · 2020-04-17T17:17:24Z

Hi @sven1977 I've added some unit tests, fixed linter issues, and fixed issues with the existing squashed gaussian test. I think the remaining issues from Travis are in the baseline (although I am not certain). Is there anything else required for getting this merged?

AmplabJenkins · 2020-04-20T21:12:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24982/
Test FAILed.

janblumenkamp · 2020-09-22T07:30:53Z

Just wondering, why was this closed? As I see it, in the meantime a squashed gaussian has been added, but it seems to be only usable in SAC as it is not automatically chosen if bounds are given in the model catalog, correct?

matthewearl · 2020-09-22T15:58:26Z

Actually, SquashedGaussian predates this PR but as you point out it's limited to certain algos specifically because it doesn't implement KL or entropy methods

janblumenkamp · 2020-12-16T10:04:43Z

Just realized that there is no Torch implementation - is that the reason why thas wasn't merged? I'd be happy to give it a try.

bveeramani · 2022-01-30T06:05:38Z

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

Install Black

pip install -I black==21.12b0

Format changed files with Black

curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh

Commit your changes.

git add --all
git commit -m "Format Python code with Black"

Merge master into your branch.

git pull upstream master

Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

stale · 2022-03-02T02:14:54Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

stale · 2022-04-18T05:57:08Z

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

matthewearl added 6 commits March 15, 2020 16:04

Implement GaussianSquashedGaussian. Still buggy

8e63d3c

fix bug in gsg logp

005c524

Fix bugs in KL and entropy methods

ba69bb7

Initial attempt at integrating GSG into catalog

113fc4f

Still some bugs to fix

Fix up the shapes returned by SG

c8e53ce

Reformatting according to scripts/format.sh

f4521f7

matthewearl commented Mar 15, 2020

View reviewed changes

rllib/models/catalog.py Show resolved Hide resolved

sven1977 reviewed Mar 16, 2020

View reviewed changes

janblumenkamp reviewed Apr 10, 2020

View reviewed changes

rllib/models/catalog.py Outdated Show resolved Hide resolved

janblumenkamp mentioned this pull request Apr 11, 2020

[rllib] PPO: Continuous action is nan #7923

Closed

2 tasks

code review markup

b0c2323

Bound loc for numerical stability

0e161fc

matthewearl marked this pull request as ready for review April 14, 2020 10:21

matthewearl added 2 commits April 16, 2020 16:34

Merge branch 'master' of github.com:ray-project/ray into me/gsg

511eef6

Merge branch 'me/gsg' of github.com:matthewearl/ray into me/gsg

86527ec

matthewearl added 3 commits April 16, 2020 20:10

Fix squashed gaussian unit test

f226d2e

Fix gaussian squashed gaussian following the previous commit

3e1d345

add test for gaussian squashed gaussian

9c9b8bc

linter fixes

731afbd

ericl closed this Sep 21, 2020

ericl reopened this Sep 22, 2020

sven1977 mentioned this pull request Jan 8, 2021

[RLlib] Finish testing matthewearl's Gaussian squashed gaussian PR #13292

Closed

6 tasks

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 2, 2022

stale bot closed this Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gaussian Squashed Gaussian #7609

Gaussian Squashed Gaussian #7609

matthewearl commented Mar 15, 2020 •

edited

Loading

AmplabJenkins commented Mar 15, 2020

AmplabJenkins commented Mar 15, 2020

sven1977 left a comment

matthewearl commented Mar 16, 2020

sven1977 commented Mar 24, 2020

matthewearl commented Mar 26, 2020

AmplabJenkins commented Apr 14, 2020

matthewearl commented Apr 14, 2020

AmplabJenkins commented Apr 14, 2020

AmplabJenkins commented Apr 16, 2020

AmplabJenkins commented Apr 16, 2020

AmplabJenkins commented Apr 16, 2020

AmplabJenkins commented Apr 17, 2020

matthewearl commented Apr 17, 2020

AmplabJenkins commented Apr 20, 2020

janblumenkamp commented Sep 22, 2020

matthewearl commented Sep 22, 2020

janblumenkamp commented Dec 16, 2020

bveeramani commented Jan 30, 2022

stale bot commented Mar 2, 2022

stale bot commented Apr 18, 2022

Gaussian Squashed Gaussian #7609

Gaussian Squashed Gaussian #7609

Conversation

matthewearl commented Mar 15, 2020 • edited Loading

Why are these changes needed?

Related issue number

Checks

AmplabJenkins commented Mar 15, 2020

AmplabJenkins commented Mar 15, 2020

sven1977 left a comment

Choose a reason for hiding this comment

matthewearl commented Mar 16, 2020

sven1977 commented Mar 24, 2020

matthewearl commented Mar 26, 2020

AmplabJenkins commented Apr 14, 2020

matthewearl commented Apr 14, 2020

AmplabJenkins commented Apr 14, 2020

AmplabJenkins commented Apr 16, 2020

AmplabJenkins commented Apr 16, 2020

AmplabJenkins commented Apr 16, 2020

AmplabJenkins commented Apr 17, 2020

matthewearl commented Apr 17, 2020

AmplabJenkins commented Apr 20, 2020

janblumenkamp commented Sep 22, 2020

matthewearl commented Sep 22, 2020

janblumenkamp commented Dec 16, 2020

bveeramani commented Jan 30, 2022

‼️ ACTION REQUIRED ‼️

stale bot commented Mar 2, 2022

stale bot commented Apr 18, 2022

matthewearl commented Mar 15, 2020 •

edited

Loading