continous control in Gaussian policy gradient #110

sufengniu · 2017-10-03T02:16:13Z

I found that in the PolicyGradients/ContinuousMountainCar example code, the author use a network to generate Gaussian distribution by mu and sigma, then the action is sampled. Here is something I am confusing:

  self.normal_dist = tf.contrib.distributions.Normal(self.mu, self.sigma)
  self.action = self.normal_dist._sample_n(1)
  self.action = tf.clip_by_value(self.action, env.action_space.low[0], env.action_space.high[0])

   # Loss and train op
   self.loss = -self.normal_dist.log_prob(self.action) * self.target

what I am confusing is the concept, In the tensorflow document, the normal_dist.log_prob is the log probability density function, not probability distribution. Thus, it is possible that the normal_dist.log_prob(action) > 0 (or normal_dist.prob > 1). since in the discrete control, normal_dist.log_prob(action) are guaranteed to be < 0. I tried ContinousMontainCar experiments, and found that the normal_dist.prob never exceeds 1.0, but this is true for all cases, since pdf could greater than 1.

Thank you

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continous control in Gaussian policy gradient #110

continous control in Gaussian policy gradient #110

sufengniu commented Oct 3, 2017

continous control in Gaussian policy gradient #110

continous control in Gaussian policy gradient #110

Comments

sufengniu commented Oct 3, 2017