-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix calling of callback on_episode_created
to conform to docstring (after reset).
#45651
[RLlib] Fix calling of callback on_episode_created
to conform to docstring (after reset).
#45651
Conversation
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…fter the 'env.reset' instead before. The docstring in the callback clearly states, it should come before the reset. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
# Create a new multi-agent episode. | ||
_episode = self._new_episode() | ||
self._make_on_episode_callback("on_episode_created", _episode) | ||
_shared_data = { | ||
"agent_to_module_mapping_fn": self.config.policy_mapping_fn, | ||
} | ||
|
||
# Reset the environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @simonsays1980 , thanks for looking into this problem. I think this issue in general here is unfortunately more complex that what meets the eye right now :( . Let me explain:
- On single-agent, users are currently not even allowed to override the on_episode_created callback :D . This is because in single-agent, we use gym's vector env, which resets envs automatically after a terminal is hit, which makes it impossible to call the on_episode_created callback before this auto-reset happens. See here and here.
- For multi-agent (where currently we don't use gym.vector) this does actually work and I therefore would suggest, we only fix this for now on the multi-agent env runner.
- In MultiAgentEnvRunner, however, we should then also fix it for
_sample_episodes()
. - We should update the docstring in callbacks.py to reflect that this callback is NOT currently valid for new API stack + single-agent.
- We should remove the on_episode_created callback call entirely from single-agent env runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this issue @simonsays1980 , an important one.
I wrote my thoughts and the current problems with this particular callback below and suggested some changes. Then we can merge this. :)
Signed-off-by: Simon Zehnder <[email protected]>
on_episode_created
to conform to docstring (after reset).on_episode_created
to conform to docstring (after reset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to:
- Add the
on_episode_created
callback call toMultiAgentEpisode._sample_episodes()
.
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…cstring (after reset). (ray-project#45651) Signed-off-by: Richard Liu <[email protected]>
Why are these changes needed?
The docstring of the
on_episode_created
callback states clearly that this callback should be called before theenv.reset
.Related issue number
Closes #45544
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.