-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLLib] Fix OneHotPreprocessor, use gym.spaces.utils.flatten #27540
[RLLib] Fix OneHotPreprocessor, use gym.spaces.utils.flatten #27540
Conversation
nice! thanks a ton for the fix. |
I've added a quick test just now, both for 2D and 3D multidiscretes! |
Do we need to do anything else here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a ton for the fix.
I will ping @sven1977 to get this merged.
d5667ad
to
65b77a7
Compare
Thanks for the approve! |
Signed-off-by: Olaf Lipinski <[email protected]>
Signed-off-by: Olaf Lipinski <[email protected]>
5b4babc
to
93cb3d0
Compare
Thanks for the PR @olipinski , could you make sure the https://buildkite.com/ray-project/ray-builders-pr/builds/43759#0182fa77-a584-4bef-a8ef-91f7e0ff4279 The script that fails is |
Signed-off-by: Olaf Lipinski <[email protected]>
This issue appears to be due to torch.nn.linear supporting only float32, which the old code used to cast every one hotted action into. I have added the cast into the preprocessor, though I'm uncertain if that should be done there, or should the dtype be specified in the environment? |
I think this is fine. thanks 👍 |
Perfect, should be all good then! |
Thanks again @olipinski , and @gjoliver for the review! |
…oject#27540) Signed-off-by: ilee300a <[email protected]>
Why are these changes needed?
As per #27496, the current implementation of OneHotPreprocessor loses information when one-hot encoding. Using the gym implementation should avoid this issue.
Related issue number
Closes #27496.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.