[RLlib] Add APPO/IMPALA multi-agent StatelessCartPole learning tests to CI (+ fix some bugs related to this). #47245

sven1977 · 2024-08-21T10:34:41Z

Add APPO/IMPALA multi-agent StatelessCartPole learning tests to CI (+ fix some bugs related to this).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2024-08-21T10:35:04Z

rllib/connectors/common/add_states_from_episodes_to_batch.py

@@ -228,7 +228,7 @@ def __call__(
        # Also, let module-to-env pipeline know that we had added a single timestep
        # time rank to the data (to remove it again).
        if not self._as_learner_connector:
-            for column, column_data in data.copy().items():
+            for column in data.keys():


sven1977 · 2024-08-21T10:35:52Z

rllib/connectors/common/add_states_from_episodes_to_batch.py

-                            item_list, T=self.max_seq_len
-                        )
+                    # Multi-agent case AND RLModule is not stateful -> Do not zero-pad
+                    # for this model.


bug fix: For multi-agent with some RLModules NOT stateful, we should NOT zero-pad anything.

Does this actually work already when using it on full length episodes coming from OfflineData?

sven1977 · 2024-08-21T10:36:24Z

rllib/connectors/env_to_module/mean_std_filter.py

-            normalized_sa_obs = self._filters[sa_episode.agent_id](
-                sa_obs, update=self._update_stats
-            )
+            try:


Make the error better, that shows up when multi_agent=True c'tor arg is forgotten.

Signed-off-by: sven1977 <[email protected]>

…appo_multi_agent_stateless_cartpole_tests

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2024-08-21T14:53:41Z

rllib/connectors/module_to_env/remove_single_ts_time_rank_from_batch.py

-            lambda p, s: s if Columns.STATE_OUT in p else np.squeeze(s, axis=0),
-            data,
-        )
+        def _remove_single_ts(item, eps_id, aid, mid):


bug fix: For mixed-MultiRLModules where some RLModules are NOT stateful, the old code would crash.

sven1977 · 2024-08-21T14:53:50Z

rllib/core/learner/learner.py

-                clear_on_reduce=True,
-            )
+        # Log all timesteps (env, agent, modules) based on given episodes/batch.
+        self._log_steps_trained_metrics(batch)


sven1977 · 2024-08-21T14:53:55Z

rllib/core/learner/learner.py

@@ -1581,49 +1566,26 @@ def _set_optimizer_lr(optimizer: Optimizer, lr: float) -> None:
    def _get_clip_function() -> Callable:
        """Returns the gradient clipping function to use, given the framework."""

-    def _log_steps_trained_metrics(self, episodes, batch, shared_data):
+    def _log_steps_trained_metrics(self, batch: MultiAgentBatch):


simonsays1980

LGTM. We complete the stack more and more :)

simonsays1980 · 2024-08-21T14:47:29Z

rllib/BUILD

+py_test(
+    name = "learning_tests_stateless_cartpole_appo",
+    main = "tuned_examples/appo/stateless_cartpole_appo.py",
+    tags = ["team:rllib", "exclusive", "learning_tests", "torch_only", "learning_tests_discrete", "learning_tests_pytorch_use_all_core"],


Is here missing a "gpu" while num_gpus=1 or do we want to test here simply a remote learner?

Correct, we test here the simple case of: 1 (remote) Learner on 1 CPU.

simonsays1980 · 2024-08-21T14:48:01Z

rllib/BUILD

+    tags = ["team:rllib", "exclusive", "learning_tests", "torch_only", "learning_tests_discrete", "learning_tests_pytorch_use_all_core"],
+    size = "large",
+    srcs = ["tuned_examples/appo/multi_agent_stateless_cartpole_appo.py"],
+    args = ["--as-test", "--enable-new-api-stack", "--num-gpus=1"]


simonsays1980 · 2024-08-21T14:48:25Z

rllib/BUILD

+    tags = ["team:rllib", "exclusive", "learning_tests", "torch_only", "learning_tests_discrete", "learning_tests_pytorch_use_all_core"],
+    size = "large",
+    srcs = ["tuned_examples/impala/stateless_cartpole_impala.py"],
+    args = ["--as-test", "--enable-new-api-stack", "--num-gpus=1"]


And here. I guess this brings us a num_learners=1, doesn't it?

This actually tries to put the 1 (remote) Learner on 1 GPU.

Sorry, you are right in that these command line options are very confusing:

On a CPU machine:
--num-gpus=1 -> 1 (remote) Learner (on CPU!)
--num-gpus=2 -> 2 (remote) Learners (on CPUs!)

On a GPU machine:
--num-gpus=1 -> 1 (remote) Learner (on GPU)
--num-gpus=2 -> 2 (remote) Learners (on GPUs)

We should probably rename these args.

simonsays1980 · 2024-08-21T14:49:53Z

rllib/connectors/common/add_states_from_episodes_to_batch.py

-                            item_list, T=self.max_seq_len
-                        )
+                    # Multi-agent case AND RLModule is not stateful -> Do not zero-pad
+                    # for this model.


Does this actually work already when using it on full length episodes coming from OfflineData?

simonsays1980 · 2024-08-21T14:51:13Z

rllib/connectors/learner/add_one_ts_to_episodes_and_truncate.py

@@ -101,10 +102,23 @@ def __call__(
        # batch: - - - - - - - T B0- - - - - R Bx- - - - R Bx
        # mask : t t t t t t t t f t t t t t t f t t t t t f

+        # TODO (sven): Same situation as in TODO below, but for multi-agent episode.
+        #  Maybe add a dedicated connector piece for this task?
+        # We extend the MultiAgentEpisode's ID by a running number here to make sure


Ah tricky. This kind of trick needs to also go into the connector docs. This can solve problems, but we need to know how.

Yup, it's getting to a point, where the default pipelines do become quite complex. We should spend some time soon to maybe simplify these again or to make the ConnectorV2 helper methods even better, e.g. self.foreach_batch_item_change_in_place.

simonsays1980 · 2024-08-21T14:52:05Z

rllib/core/learner/learner.py

@@ -1294,24 +1295,8 @@ def _update_from_batch_or_episodes(
            if not self.should_module_be_updated(module_id, batch):
                del batch.policy_batches[module_id]

-        # Log all timesteps (env, agent, modules) based on given episodes.


FInally, this goes away haha.

We probably need to remvoe this also from learn_from_iterator

Great catch. Will check ...

simonsays1980 · 2024-08-21T14:54:44Z

rllib/tuned_examples/dqn/multi_agent_cartpole_dqn.py

-    "multi_agent_cartpole",
-    lambda _: MultiAgentCartPole({"num_agents": args.num_agents}),
-)
+register_env("multi_agent_cartpole", lambda cfg: MultiAgentCartPole(config=cfg))


For DQN and SAC we have not stateful modules enables, yet. What do we need for it? The buffers need to collect time sequences, correct?

Yes, this is the huge advantage of the "episodes-until-the-last-second" design :) Everything now behaves the same and we can simply pass in a list of episodes (from offline data) into any Learner and its Learner connector pipelines behave the exact same.

Signed-off-by: sven1977 <[email protected]>

…appo_multi_agent_stateless_cartpole_tests

Signed-off-by: sven1977 <[email protected]>

wip

12c7ce8

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners August 21, 2024 10:34

sven1977 assigned simonsays1980 Aug 21, 2024

sven1977 commented Aug 21, 2024

View reviewed changes

sven1977 added 2 commits August 21, 2024 12:50

wip

501d595

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into add_…

43502ed

…appo_multi_agent_stateless_cartpole_tests

sven1977 enabled auto-merge (squash) August 21, 2024 11:53

github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 21, 2024

wip

a070bcc

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge August 21, 2024 14:43

sven1977 commented Aug 21, 2024

View reviewed changes

simonsays1980 approved these changes Aug 21, 2024

View reviewed changes

wip

7c747c9

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) August 21, 2024 15:08

github-actions bot disabled auto-merge August 21, 2024 15:08

fix

387517a

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) August 21, 2024 18:04

fixes

310e06c

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge August 22, 2024 05:35

sven1977 added 5 commits August 22, 2024 10:30

wip

8305c0b

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into add_…

afa6fdb

…appo_multi_agent_stateless_cartpole_tests

wip

6d79703

Signed-off-by: sven1977 <[email protected]>

wip

d5bd869

Signed-off-by: sven1977 <[email protected]>

wip

387ccda

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) August 22, 2024 17:26

sven1977 merged commit 746f6b6 into ray-project:master Aug 22, 2024
6 checks passed

sven1977 deleted the add_appo_multi_agent_stateless_cartpole_tests branch August 23, 2024 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add APPO/IMPALA multi-agent StatelessCartPole learning tests to CI (+ fix some bugs related to this). #47245

[RLlib] Add APPO/IMPALA multi-agent StatelessCartPole learning tests to CI (+ fix some bugs related to this). #47245

sven1977 commented Aug 21, 2024 •

edited

Loading

sven1977 Aug 21, 2024

sven1977 Aug 21, 2024

simonsays1980 Aug 21, 2024

sven1977 Aug 21, 2024

sven1977 Aug 21, 2024

sven1977 Aug 21, 2024

sven1977 Aug 21, 2024

simonsays1980 left a comment

simonsays1980 Aug 21, 2024

sven1977 Aug 21, 2024

simonsays1980 Aug 21, 2024

simonsays1980 Aug 21, 2024

sven1977 Aug 21, 2024

simonsays1980 Aug 21, 2024

simonsays1980 Aug 21, 2024

sven1977 Aug 21, 2024

simonsays1980 Aug 21, 2024

simonsays1980 Aug 21, 2024

sven1977 Aug 21, 2024

sven1977 Aug 21, 2024

simonsays1980 Aug 21, 2024

sven1977 Aug 21, 2024

[RLlib] Add APPO/IMPALA multi-agent StatelessCartPole learning tests to CI (+ fix some bugs related to this). #47245

[RLlib] Add APPO/IMPALA multi-agent StatelessCartPole learning tests to CI (+ fix some bugs related to this). #47245

Conversation

sven1977 commented Aug 21, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Aug 21, 2024 •

edited

Loading