Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] MultiAgentEpisode: Fix various bugs in slice(). #44594

Merged
merged 4 commits into from
Apr 10, 2024

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Apr 9, 2024

  • MultiAgentEpisode (MAE): Fix various bugs in slice(), mostly related to using a lookback buffer.
  • Enhanced test cases for MAE slicing.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 marked this pull request as ready for review April 9, 2024 19:16
@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 9, 2024
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great improvements in this tough field.

):
terminateds[aid] = sa_episode.is_terminated
truncateds[aid] = sa_episode.is_truncated
# Determine this agent's t_started.
if start < len(mapping):
for i in range(start, len(mapping)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent_t_started[aid] = next(item for item in mapping[start:] if item != self.SKIP_ENV_TS_TAG) 

?

@@ -1809,12 +1820,14 @@ def _init_single_agent_episodes(
len(observations_per_agent[agent_id]) - 1
)

# Those agents that did NOT step get None added to their mapping.
# Those agents that did NOT step get self.SKIP_ENV_TS_TAG added to their
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw SKIP_ENV_TS_TAG is super important to be explicitly documented. I was at first wondering what it meant :)


# Extend ourselves. In case, episode_chunk is already terminated (and finalized)
# we need to convert to lists (as we are ourselves still filling up lists).
self.observations.extend(other.get_observations())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure here, but does extend with all the observations of other duplicate the one observation that is in both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does, but before that, we do:

        self.observations.pop()
        self.infos.pop()

so, it's fine :)

@@ -637,6 +584,59 @@ def finalize(self) -> "SingleAgentEpisode":

return self

def concat_episode(self, other: "SingleAgentEpisode") -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the other here :)

{"a1": 7},
{"a1": 8},
{"a0": 9},
]
)
check(len(episode), 9)

# Slice the episode in different ways and check results.
# Empty slice.
slice_ = episode[100:100]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In relation to this: InfiniteLookbackBuffer[start:stop] results in a list. Do we want to keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to return a new InfiniteLookbackBuffer instead?
Yeah, that could be an option, too. I'm not sure. If we can safely change that, it would be better, I think. The good thing is that this API is not user-facing, so we can still change it later.

check(a1.observations, [2, 3])
check(a1.actions, [2])
check(a1.rewards, [0.2])
check(a1.is_done, False)

# Test what happens if we have lookback buffers.
observations = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw very good example to explain what a lookback buffer is and what it does.

If a `InfiniteLookbackBuffer` the data gets
concatenated. If a `list` the list is concatenated to the
`self.data`.
other: Another `InfiniteLookbackBuffer` or a `list` or a number.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that was missing. I am sorry :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! It takes a village ... :)
Such a complex API. It's not done yet. I also left some things open when I was working on this. Come time ...

@sven1977 sven1977 merged commit 3fea138 into ray-project:master Apr 10, 2024
5 checks passed
@sven1977 sven1977 deleted the multi_agent_episode_fix_slice branch April 10, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants