Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributional Reinforcement Learning with Quantile Regression #3

Open
yydxlv opened this issue Mar 31, 2018 · 6 comments
Open

Distributional Reinforcement Learning with Quantile Regression #3

yydxlv opened this issue Mar 31, 2018 · 6 comments

Comments

@yydxlv
Copy link

yydxlv commented Mar 31, 2018

Hi, what does the "u" means in the following code snippets? It seems that the "u" is not defined in the code? Thanks!

huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2)
huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k))
quantile_loss = (tau - (u < 0).float()).abs() * huber_loss

@hohoCode
Copy link

I think probably it should be something like:

u = dist - expected_quant

@angmc
Copy link

angmc commented Apr 12, 2018

After adding u = dist - expected_quant

TypeError Traceback (most recent call last)
in ()
15
16 if len(replay_buffer) > batch_size:
---> 17 loss = compute_td_loss(batch_size)
18 losses.append(loss.data[0])
19

in compute_td_loss(batch_size)
17 huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2)
18 huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k))
---> 19 quantile_loss = (tau - (u < 0).float()).abs() * huber_loss
20 loss = quantile_loss.sum() / num_quant
21

/home/--/anaconda2/envs/tensorflow4/lib/python2.7/site-packages/torch/tensor.pyc in sub(self, other)
310
311 def sub(self, other):
--> 312 return self.sub(other)
313
314 def rsub(self, other):

TypeError: sub received an invalid combination of arguments - got (Variable), but expected one of:

  • (float value)
    didn't match because some of the arguments have invalid types: (Variable)
  • (torch.FloatTensor other)
    didn't match because some of the arguments have invalid types: (Variable)
  • (float value, torch.FloatTensor other)

@qfettes
Copy link

qfettes commented Jun 3, 2018

Should be something like:

u = expected_dist.t().unsqueeze(-1) - dist
loss = self.huber(u) * (self.tau.view(1, -1) - (u.detach() < 0).float()).abs()
loss = loss.mean(1).sum()

@angmc
Copy link

angmc commented Jun 5, 2018

When I last looked at this it ran after converting to a variable:
u=expected_quant-dist
huber_loss = 0.5 * u.abs().clamp(min=0.0, max=k).pow(2)
huber_loss += k * (u.abs() - u.abs().clamp(min=0.0, max=k))
quantile_loss = (autograd.Variable(tau.cuda()) - ((u < 0).float())).abs() * (huber_loss)
loss = (quantile_loss.sum() / num_quant)

@LRiver-wut
Copy link

Friend, this a question.

@LRiver-wut
Copy link

It confused me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants