Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Keep reward shape and dtype the same when resetting and stepping #6

Merged
merged 3 commits into from
Jan 16, 2024

Conversation

RuanJohn
Copy link
Collaborator

@RuanJohn RuanJohn commented Jan 16, 2024

What

Due to default behaviour in Jumanji the reward was set to a single float value when the environment was reset, but when stepping num_agent int rewards are returned. This PR fixes this by passing in num_agents as the shape argument to the restart, termination and transition methods in Jumanji.

Extra

  • Added a new test that checks that all leaves in the timestep pytree have the same shapes and data types when resetting and stepping the environment.
  • Since discounts will now also have shape (num_agents, ) a test checking the discount shapes was also updated.
  • Data types of the rewards are also explicitly cast to floats to ensure consistency between stepping and resetting the environment.

@RuanJohn RuanJohn added the bug Something isn't working label Jan 16, 2024
@RuanJohn RuanJohn self-assigned this Jan 16, 2024
Copy link
Contributor

@arnupretorius arnupretorius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RuanJohn 👍

@arnupretorius arnupretorius merged commit 4c5d8aa into main Jan 16, 2024
3 checks passed
@RuanJohn RuanJohn deleted the fix/reset-step-reward-shape branch January 16, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants