Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix calling of callback on_episode_created to conform to docstring (after reset). #45651

Merged
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
c748df8
Changed comment.
simonsays1980 May 10, 2024
6409007
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 13, 2024
d2f9030
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 14, 2024
a3416a8
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 15, 2024
8582ad9
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 16, 2024
b565f34
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 21, 2024
c0eed1f
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 22, 2024
341cb95
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 22, 2024
b76807f
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 24, 2024
af9c9e9
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 27, 2024
e422c42
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 28, 2024
26e0926
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 29, 2024
562f586
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 31, 2024
b95848c
Fixed a minor bug that was calling the calback 'on_episode_created' a…
simonsays1980 May 31, 2024
ca98704
Modified callback order in 'MultiAgentEnvRunner'.
simonsays1980 May 31, 2024
d8eee30
Implemented @sven1977's review.
simonsays1980 May 31, 2024
a694987
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 Jun 5, 2024
4ba3105
Merge branch 'master' into fix-callback-on-episode-created
simonsays1980 Jun 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions rllib/algorithms/callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,19 +245,22 @@ def on_episode_created(
"""Callback run when a new episode is created (but has not started yet!).

This method gets called after a new Episode(V2) (old stack) or
SingleAgentEpisode/MultiAgentEpisode instance has been created.
MultiAgentEpisode instance has been created.
This happens before the respective sub-environment's (usually a gym.Env)
`reset()` is called by RLlib.

1) Episode(V2)/Single-/MultiAgentEpisode created: This callback is called.
Note, at the moment this callback does not get called in the new API stack
simonsays1980 marked this conversation as resolved.
Show resolved Hide resolved
and single-agent mode.

1) Episode(V2)/MultiAgentEpisode created: This callback is called.
2) Respective sub-environment (gym.Env) is `reset()`.
3) Callback `on_episode_start` is called.
4) Stepping through sub-environment/episode commences.

Args:
episode: The newly created episode. On the new API stack, this will be a
SingleAgentEpisode or MultiAgentEpisode object. On the old API stack,
this will be a Episode or EpisodeV2 object.
MultiAgentEpisode object. On the old API stack, this will be a
Episode or EpisodeV2 object.
This is the episode that is about to be started with an upcoming
`env.reset()`. Only after this reset call, the `on_episode_start`
callback will be called.
Expand Down
7 changes: 3 additions & 4 deletions rllib/env/multi_agent_env_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -414,17 +414,16 @@ def _sample_episodes(

done_episodes_to_return: List[MultiAgentEpisode] = []

# Reset the environment.
# TODO (simon): Check, if we need here the seed from the config.
obs, infos = self.env.reset()

# Create a new multi-agent episode.
_episode = self._new_episode()
self._make_on_episode_callback("on_episode_created", _episode)
_shared_data = {
"agent_to_module_mapping_fn": self.config.policy_mapping_fn,
}

# Reset the environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @simonsays1980 , thanks for looking into this problem. I think this issue in general here is unfortunately more complex that what meets the eye right now :( . Let me explain:

  • On single-agent, users are currently not even allowed to override the on_episode_created callback :D . This is because in single-agent, we use gym's vector env, which resets envs automatically after a terminal is hit, which makes it impossible to call the on_episode_created callback before this auto-reset happens. See here and here.
  • For multi-agent (where currently we don't use gym.vector) this does actually work and I therefore would suggest, we only fix this for now on the multi-agent env runner.
  • In MultiAgentEnvRunner, however, we should then also fix it for _sample_episodes().
  • We should update the docstring in callbacks.py to reflect that this callback is NOT currently valid for new API stack + single-agent.
  • We should remove the on_episode_created callback call entirely from single-agent env runner.

# TODO (simon): Check, if we need here the seed from the config.
obs, infos = self.env.reset()
# Set initial obs and infos in the episodes.
_episode.add_env_reset(observations=obs, infos=infos)
self._make_on_episode_callback("on_episode_start", _episode)
Expand Down
14 changes: 8 additions & 6 deletions rllib/env/single_agent_env_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,11 +224,12 @@ def _sample_timesteps(

# Have to reset the env (on all vector sub_envs).
if force_reset or self._needs_initial_reset:
# Create n new episodes and make the `on_episode_created` callbacks.
# Create n new episodes.
# TODO (sven): Add callback `on_episode_created` as soon as
# `gymnasium-v1.0.0a2` PR is coming.
self._episodes = []
for env_index in range(self.num_envs):
self._episodes.append(self._new_episode())
self._make_on_episode_callback("on_episode_created", env_index)
self._shared_data = {}

# Erase all cached ongoing episodes (these will never be completed and
Expand Down Expand Up @@ -437,15 +438,16 @@ def _sample_episodes(

done_episodes_to_return: List[SingleAgentEpisode] = []

# Reset the environment.
# TODO (simon): Check, if we need here the seed from the config.
obs, infos = self.env.reset()
episodes = []
for env_index in range(self.num_envs):
episodes.append(self._new_episode())
self._make_on_episode_callback("on_episode_created", env_index, episodes)
# TODO (sven): Add callback `on_episode_created` as soon as
# `gymnasium-v1.0.0a2` PR is coming.
_shared_data = {}

# Reset the environment.
# TODO (simon): Check, if we need here the seed from the config.
obs, infos = self.env.reset()
for env_index in range(self.num_envs):
episodes[env_index].add_env_reset(
observation=obs[env_index],
Expand Down
Loading