Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continous control in Gaussian policy gradient #110

Open
sufengniu opened this issue Oct 3, 2017 · 0 comments
Open

continous control in Gaussian policy gradient #110

sufengniu opened this issue Oct 3, 2017 · 0 comments

Comments

@sufengniu
Copy link

I found that in the PolicyGradients/ContinuousMountainCar example code, the author use a network to generate Gaussian distribution by mu and sigma, then the action is sampled. Here is something I am confusing:

  self.normal_dist = tf.contrib.distributions.Normal(self.mu, self.sigma)
  self.action = self.normal_dist._sample_n(1)
  self.action = tf.clip_by_value(self.action, env.action_space.low[0], env.action_space.high[0])

   # Loss and train op
   self.loss = -self.normal_dist.log_prob(self.action) * self.target

what I am confusing is the concept, In the tensorflow document, the normal_dist.log_prob is the log probability density function, not probability distribution. Thus, it is possible that the normal_dist.log_prob(action) > 0 (or normal_dist.prob > 1). since in the discrete control, normal_dist.log_prob(action) are guaranteed to be < 0. I tried ContinousMontainCar experiments, and found that the normal_dist.prob never exceeds 1.0, but this is true for all cases, since pdf could greater than 1.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant