REINFORCE call in the decoder base class needs a detach call on top of the reward. #13

prithv1 · 2018-10-10T00:01:36Z

visdial-rl/visdial/models/decoders/gen.py

Line 243 in 1fb7e88

loss += -1 * log_prob * (reward * (self.mask[:, t].float()))

Following the paper, the above should be replaced by
loss += -1 * log_prob * (reward.detach() * (self.mask[:, t].float()))

Not having a .detach() on the reward here provides another source of gradients to the feature regression module in addition to the feature loss, the only difference being these gradients are scaled by the log-probs, which does not seem to mean anything intuitively.

The text was updated successfully, but these errors were encountered:

nirbhayjm added the bug Something isn't working label Oct 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REINFORCE call in the decoder base class needs a detach call on top of the reward. #13

REINFORCE call in the decoder base class needs a detach call on top of the reward. #13

prithv1 commented Oct 10, 2018

REINFORCE call in the decoder base class needs a detach call on top of the reward. #13

REINFORCE call in the decoder base class needs a detach call on top of the reward. #13

Comments

prithv1 commented Oct 10, 2018