Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] MultiAgentEpisode: Fix/enhance cut() API. #44677

Merged
merged 12 commits into from
Apr 12, 2024

Conversation

sven1977
Copy link
Contributor

MultiAgentEpisode: Fix/enhance cut() API.

Currently, when cutting a multi-agent episode, we do not properly account for sections in which one (or more) agents are not receiving observations (and are accumulating hanging rewards and one hanging action). These hanging values should be added to a different cache (the "before") cache, rather than the usual cache at the end of the episode. This then helps once we need to concatenate different chunks, e.g. one with a end-cache to the left of another chunk with a begin-cache. The caches have to match (actions) and/or get added (rewards) to yield the correct original MultiAgentEpisode.

NOTE: While this enhancement does help with replay buffers, in which we keep concatenating new chunks (with possible begin caches) to already stored chunks (with possible end caches), it still loses single-agent timesteps in the case where we have a) on-policylearning (no replay buffers) AND b) we cut() the MAEpisode (in the EnvRunner) at exactly a timestep, in which one or more agents are not receiving observations. These single-agent timesteps are lost and cannot be learned from.
A solution for this problem could be (but will have to be discussed and implemented) to also store the most recent observation as a hanging one in the before-cache.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

successor._hanging_actions_end = copy.deepcopy(self._hanging_actions_end)
successor._hanging_rewards_end = self._hanging_rewards_end.copy()
successor._hanging_extra_model_outputs_end = copy.deepcopy(
# Copy over the hanging (end) values into the hanging (begin) chaches of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"chaches" -> "caches". And can we leave a note here, why we need the _hanging_actions_begin cache here? Why not writing it into the successor._hanging_actions_end ones?

@sven1977 sven1977 merged commit 39c5fbe into ray-project:master Apr 12, 2024
5 checks passed
@sven1977 sven1977 deleted the multi_agent_episode_fix_cut branch April 12, 2024 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants