-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Implementation of MultiAgentEpisodeReplayBuffer
for the new API stack.
#44450
[RLlib] Implementation of MultiAgentEpisodeReplayBuffer
for the new API stack.
#44450
Conversation
Signed-off-by: sven1977 <[email protected]>
…i_agent_episode_add_module_for_api
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…i_agent_episode_add_slice_api
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…d 'add' method. Fixed a minor bug in the 'MultiAgentEpisode' b/c the 'agent_module_ids' mapping was not assigned in the 'MultiAgentEpisode.__init__'. Furthermore, added some tests for the new buffer. Signed-off-by: Simon Zehnder <[email protected]>
… either 'independent' or 'synchronized'. There appears to be an inconsistency between the action getters of the SingleAgentEpisode' and the 'MultiAgentEpisode' - the latter retrieves the timestep one earlier than observations. Signed-off-by: Simon Zehnder <[email protected]>
MultiAgentEpisodeReplayBuffer
for the new API stack.
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
|
||
# Stores indices of module (single-agent) timesteps. | ||
self._module_indices: Dict[str, List[Tuple[int]]] = defaultdict(list) | ||
# Stores for each module the episode indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same: improve typehint to DefaultDict[ModuleID, Dict[EpisodeID, int]]
? Is int
correct here or should this be a 4-tuple as well?
Let's also above declare what "index" means throughout this entire class:
Something like:
"In this class, we use a global indexing system, where an individual index is a 4-tuple, consisting of 1) ...2)... 3)... 4) ..."
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…i_agent_episode_add_slice_api
…api' into multi-agent-episode-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
… and 'SingleAgentEpisode'. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…ays1980/ray into multi-agent-episode-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…a buffer. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…ts for getters and counters. Signed-off-by: Simon Zehnder <[email protected]>
…ixed. Signed-off-by: Simon Zehnder <[email protected]>
rllib/BUILD
Outdated
@@ -1524,6 +1524,13 @@ py_test( | |||
srcs = ["utils/replay_buffers/tests/test_episode_replay_buffer.py"] | |||
) | |||
|
|||
oy_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> py_test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fixes.
I understand we'll uncomment those tests for the new buffer once I have fixed the getter API of MultiAgentEpisode.
Signed-off-by: Sven Mika <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…ays1980/ray into multi-agent-episode-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…ts that depend on it. Signed-off-by: Simon Zehnder <[email protected]>
… the test for synchronized sampling'. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Why are these changes needed?
With the new API stack and the
EnvRunner API
came the necessity to define new replay buffers that can work completely on episode objects and stores the latter. This PR proposes the multi-agent version of theseEpisodeReplayBuffer
s.Sampling works in
independent
andsynchronized
replay mode and can use differentn-step
for each item.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.