[RLlib] MultiAgentEpisode: Fix/enhance `cut()` API. #44677

sven1977 · 2024-04-11T15:41:22Z

MultiAgentEpisode: Fix/enhance cut() API.

Currently, when cutting a multi-agent episode, we do not properly account for sections in which one (or more) agents are not receiving observations (and are accumulating hanging rewards and one hanging action). These hanging values should be added to a different cache (the "before") cache, rather than the usual cache at the end of the episode. This then helps once we need to concatenate different chunks, e.g. one with a end-cache to the left of another chunk with a begin-cache. The caches have to match (actions) and/or get added (rewards) to yield the correct original MultiAgentEpisode.

NOTE: While this enhancement does help with replay buffers, in which we keep concatenating new chunks (with possible begin caches) to already stored chunks (with possible end caches), it still loses single-agent timesteps in the case where we have a) on-policylearning (no replay buffers) AND b) we cut() the MAEpisode (in the EnvRunner) at exactly a timestep, in which one or more agents are not receiving observations. These single-agent timesteps are lost and cannot be learned from.
A solution for this problem could be (but will have to be discussed and implemented) to also store the most recent observation as a hanging one in the before-cache.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

simonsays1980

LGTM

simonsays1980 · 2024-04-11T17:45:53Z

rllib/env/multi_agent_episode.py

-        successor._hanging_actions_end = copy.deepcopy(self._hanging_actions_end)
-        successor._hanging_rewards_end = self._hanging_rewards_end.copy()
-        successor._hanging_extra_model_outputs_end = copy.deepcopy(
+        # Copy over the hanging (end) values into the hanging (begin) chaches of the


"chaches" -> "caches". And can we leave a note here, why we need the _hanging_actions_begin cache here? Why not writing it into the successor._hanging_actions_end ones?

…i_agent_episode_fix_cut

Signed-off-by: sven1977 <[email protected]>

sven1977 added 10 commits April 9, 2024 20:58

wip

970275d

Signed-off-by: sven1977 <[email protected]>

wip

6606c3b

Signed-off-by: sven1977 <[email protected]>

merge

b1fd64e

Signed-off-by: sven1977 <[email protected]>

wip

ba98ade

Signed-off-by: sven1977 <[email protected]>

wip

d791077

Signed-off-by: sven1977 <[email protected]>

LINT

7f18cb1

Signed-off-by: sven1977 <[email protected]>

wip

64e53d7

Signed-off-by: sven1977 <[email protected]>

merge

5c216a1

Signed-off-by: sven1977 <[email protected]>

merge

e1ef111

Signed-off-by: sven1977 <[email protected]>

wip

02148b9

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha and simonsays1980 as code owners April 11, 2024 15:41

simonsays1980 approved these changes Apr 11, 2024

View reviewed changes

sven1977 added 2 commits April 12, 2024 07:13

Merge branch 'master' of https://github.com/ray-project/ray into mult…

224eef6

…i_agent_episode_fix_cut

wip

f02fe56

Signed-off-by: sven1977 <[email protected]>

sven1977 merged commit 39c5fbe into ray-project:master Apr 12, 2024
5 checks passed

sven1977 deleted the multi_agent_episode_fix_cut branch April 12, 2024 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] MultiAgentEpisode: Fix/enhance `cut()` API. #44677

[RLlib] MultiAgentEpisode: Fix/enhance `cut()` API. #44677

sven1977 commented Apr 11, 2024

simonsays1980 left a comment

simonsays1980 Apr 11, 2024

[RLlib] MultiAgentEpisode: Fix/enhance cut() API. #44677

[RLlib] MultiAgentEpisode: Fix/enhance cut() API. #44677

Conversation

sven1977 commented Apr 11, 2024

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 Apr 11, 2024

Choose a reason for hiding this comment

[RLlib] MultiAgentEpisode: Fix/enhance `cut()` API. #44677

[RLlib] MultiAgentEpisode: Fix/enhance `cut()` API. #44677