Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paper/code conflict: using minimum Q in policy gradient #14

Open
jpreiss opened this issue Aug 16, 2018 · 1 comment
Open

paper/code conflict: using minimum Q in policy gradient #14

jpreiss opened this issue Aug 16, 2018 · 1 comment

Comments

@jpreiss
Copy link

jpreiss commented Aug 16, 2018

The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:

We then use the minimum of the Q-functions for the value gradient in Equation 6 and policy gradient in Equation 13

However, the code in sac/algos/sac.py uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.

Is there a reason for the discrepancy? Thanks!

@haarnoja
Copy link
Owner

Good catch! We actually tried both versions and did not find much difference between them. We'll fix the code in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants