Fix: Keep reward shape and dtype the same when resetting and stepping #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Due to default behaviour in Jumanji the reward was set to a single
float
value when the environment was reset, but when steppingnum_agent
int
rewards are returned. This PR fixes this by passing innum_agents
as theshape
argument to therestart
,termination
andtransition
methods in Jumanji.Extra
timestep
pytree have the same shapes and data types when resetting and stepping the environment.(num_agents, )
a test checking the discount shapes was also updated.