-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] MultiAgentEpisode: Fix various bugs in slice()
.
#44594
[RLlib] MultiAgentEpisode: Fix various bugs in slice()
.
#44594
Conversation
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great improvements in this tough field.
): | ||
terminateds[aid] = sa_episode.is_terminated | ||
truncateds[aid] = sa_episode.is_truncated | ||
# Determine this agent's t_started. | ||
if start < len(mapping): | ||
for i in range(start, len(mapping)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agent_t_started[aid] = next(item for item in mapping[start:] if item != self.SKIP_ENV_TS_TAG)
?
@@ -1809,12 +1820,14 @@ def _init_single_agent_episodes( | |||
len(observations_per_agent[agent_id]) - 1 | |||
) | |||
|
|||
# Those agents that did NOT step get None added to their mapping. | |||
# Those agents that did NOT step get self.SKIP_ENV_TS_TAG added to their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw SKIP_ENV_TS_TAG
is super important to be explicitly documented. I was at first wondering what it meant :)
|
||
# Extend ourselves. In case, episode_chunk is already terminated (and finalized) | ||
# we need to convert to lists (as we are ourselves still filling up lists). | ||
self.observations.extend(other.get_observations()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure here, but does extend
with all the observations of other
duplicate the one observation that is in both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does, but before that, we do:
self.observations.pop()
self.infos.pop()
so, it's fine :)
@@ -637,6 +584,59 @@ def finalize(self) -> "SingleAgentEpisode": | |||
|
|||
return self | |||
|
|||
def concat_episode(self, other: "SingleAgentEpisode") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the other
here :)
{"a1": 7}, | ||
{"a1": 8}, | ||
{"a0": 9}, | ||
] | ||
) | ||
check(len(episode), 9) | ||
|
||
# Slice the episode in different ways and check results. | ||
# Empty slice. | ||
slice_ = episode[100:100] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In relation to this: InfiniteLookbackBuffer[start:stop]
results in a list
. Do we want to keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to return a new InfiniteLookbackBuffer
instead?
Yeah, that could be an option, too. I'm not sure. If we can safely change that, it would be better, I think. The good thing is that this API is not user-facing, so we can still change it later.
check(a1.observations, [2, 3]) | ||
check(a1.actions, [2]) | ||
check(a1.rewards, [0.2]) | ||
check(a1.is_done, False) | ||
|
||
# Test what happens if we have lookback buffers. | ||
observations = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw very good example to explain what a lookback buffer is and what it does.
If a `InfiniteLookbackBuffer` the data gets | ||
concatenated. If a `list` the list is concatenated to the | ||
`self.data`. | ||
other: Another `InfiniteLookbackBuffer` or a `list` or a number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that was missing. I am sorry :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! It takes a village ... :)
Such a complex API. It's not done yet. I also left some things open when I was working on this. Come time ...
slice()
, mostly related to using a lookback buffer.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.