Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLLib] Fix OneHotPreprocessor, use gym.spaces.utils.flatten #27540

Merged
merged 4 commits into from
Sep 9, 2022

Conversation

olipinski
Copy link
Contributor

Why are these changes needed?

As per #27496, the current implementation of OneHotPreprocessor loses information when one-hot encoding. Using the gym implementation should avoid this issue.

Related issue number

Closes #27496.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@olipinski olipinski changed the title Fix OneHotPreprocessor, use gym.spaces.utils.flatten [RLLib] Fix OneHotPreprocessor, use gym.spaces.utils.flatten Aug 5, 2022
@gjoliver
Copy link
Member

gjoliver commented Aug 5, 2022

nice! thanks a ton for the fix.
can you help add a quick unit test in rllib/model/tests/test_preprocess.py for this behavior?
the team is really trying to push for better quality and testing in the long terms, fix like this is perfect for adding a unit test so there won't be regression.

@olipinski
Copy link
Contributor Author

can you help add a quick unit test in rllib/model/tests/test_preprocess.py for this behavior?

I've added a quick test just now, both for 2D and 3D multidiscretes!

@olipinski
Copy link
Contributor Author

Do we need to do anything else here?

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a ton for the fix.
I will ping @sven1977 to get this merged.

@olipinski
Copy link
Contributor Author

Thanks for the approve!

@sven1977
Copy link
Contributor

sven1977 commented Sep 7, 2022

Thanks for the PR @olipinski , could you make sure the autoregressive_action_dist test case passes? There seems to be a connection to your changes:

https://buildkite.com/ray-project/ray-builders-pr/builds/43759#0182fa77-a584-4bef-a8ef-91f7e0ff4279

The script that fails is rllib/examples/autoregressive_action_dist.py run with the --framework torch flag.

@olipinski
Copy link
Contributor Author

https://buildkite.com/ray-project/ray-builders-pr/builds/43759#0182fa77-a584-4bef-a8ef-91f7e0ff4279

The script that fails is rllib/examples/autoregressive_action_dist.py run with the --framework torch flag.

This issue appears to be due to torch.nn.linear supporting only float32, which the old code used to cast every one hotted action into. I have added the cast into the preprocessor, though I'm uncertain if that should be done there, or should the dtype be specified in the environment?

@gjoliver
Copy link
Member

gjoliver commented Sep 8, 2022

I think this is fine. thanks 👍

@olipinski
Copy link
Contributor Author

Perfect, should be all good then!

@sven1977 sven1977 merged commit 3dadc74 into ray-project:master Sep 9, 2022
@sven1977
Copy link
Contributor

sven1977 commented Sep 9, 2022

Thanks again @olipinski , and @gjoliver for the review!

@olipinski olipinski deleted the preprocess-multi-discrete branch September 12, 2022 09:41
ilee300a pushed a commit to ilee300a/ray that referenced this pull request Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib] OneHotPreprocessor one-hots MultiDiscrete Spaces incorrectly incurring information loss
3 participants