-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] A2C fails with functional=True and shifted=True for ValueEstimator #2265
Comments
Thanks for reporting |
No you're right the nightlies are broken. I will fix that # Adapt this if you need cuda, e.g. nightly/cu124
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu -U
pip3 install git+https://github.com/pytorch/tensordict -U
pip3 install git+https://github.com/pytorch/rl -U LMK if you can reprod after that! |
The nightly release should be available now (not for windows though) |
Thanks for the quick fix regarding the nightly builds 👍
I am a bit surprised that it works on your side, as the primary code-snippets are still the same on the main-branch. self.value_estimator(
tensordict,
params=self._cached_detach_critic_network_params,
target_params=self.target_critic_network_params,
) Which ultimately fails in if next_params is not None and next_params is not params:
raise ValueError(
"the value at t and t+1 cannot be retrieved in a single call without recurring to vmap when both params and next params are passed."
) Note that I am running completely on CPU without GPU support on the running machine, don't know if that makes any difference 🤷. |
Describe the bug
Not quite sure if this is supported behavior, but if I set
functional=True
for the A2C loss andshifted=True
forTD0Estimator
I get an internal error.To Reproduce
Expected behavior
The losses are calculated correctly and the value_network is only called once in the computation of the advantage.
System info
Reason and Possible fixes
The problem seems to be in this snippet, where detached parameter are used for
params
which makes them unequal.The text was updated successfully, but these errors were encountered: