[RLlib] Agent collector time complexity reduction #31693

ArturNiederfahrenhorst · 2023-01-16T20:44:51Z

Why are these changes needed?

In the light of the recent QMix regression for connectors, we have found that this regression affects QMix because of the very short episode lengths in the two-step-game. These lead to very frequent calls to AgentCollector.build_for_training().
This PR tries to optimize build_for_training() (and some other things that I found along the way) for time complexity.

Changes contained in this PR lead to speeding up AgentCollector.build_for_training() roughly indicated by the following metrics (average over 500 samples):
two-step: Single-agent episodes of length two, no recurrency.
ten-step: Single-agent episodes of length ten, no recurrency.
sixtyone-step: Single-agent episodes of length ten, with recurrency (and padding).

mean_raw_obs_processing is where we spend much of our time in env_runnver_v2 and the source of the regression in question.

For two step game this pans out as follows for the mean_obs_preprocessing time:
blue w/o connectors
red w/ connectors
orange w/ connectors and optimizations

... and as follows for the overall throughput:

For the r2d2 compilation test (mean episode length ~ 20), this pans out as follows:
blue is w/o connectors
oragen is w/ connectors
light-blue w/ connectors and optimizations

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

ArturNiederfahrenhorst · 2023-01-17T21:32:35Z

rllib/evaluation/episode_v2.py

@@ -295,7 +296,7 @@ def postprocess_episode(

            if (
                not pre_batch.is_single_trajectory()
-                or len(set(pre_batch[SampleBatch.EPS_ID])) > 1
+                or len(np.unique(pre_batch[SampleBatch.EPS_ID])) > 1


Because of the above changes, EPS_IDs turn out to be np arrays as well, so set does not work here anymore.

kouroshHakha

LGTM if the tests pass.

kouroshHakha · 2023-01-17T17:03:41Z

rllib/connectors/action/pipeline.py

+            timer = self.timers[str(c)]
+            with timer:
+                ac_data = c(ac_data)
+            timer.push_units_processed(1)


can you remind me what timer.push_units_processed(1) does?

Went back to look at the implementation for this and found that it is only needed for throughput measurements. Since the mean time is not calculated over the units processed but over the number of timings. Thanks!

kouroshHakha · 2023-01-17T17:06:19Z

rllib/connectors/action/pipeline.py

@@ -19,10 +21,17 @@
 class ActionConnectorPipeline(ConnectorPipeline, ActionConnector):
    def __init__(self, ctx: ConnectorContext, connectors: List[Connector]):
        super().__init__(ctx, connectors)
+        self.timers = defaultdict(_Timer)
+
+    def reset(self, env_id: str):


What are your thoughts on implementing timer capabilities in the baseclass connectors vs. here? (not saying we should, just want to hear your argument).

Pro implementing in baseclass:

It's generally a good idea to pull functionality down to a lower level if it does not increase complexity
Against implementing in baseclass:

It enlargens the interface between the pipeline and the connectors if we call the timers from the pipeline or, alternatively, we'd have to assume that timers are correctly handled by someone subclassing Connectors in a transform method.

I think the solution this PR is at introduces less complexity compared to putting this in Connectors.

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver

2 quick questions

gjoliver · 2023-01-17T22:25:24Z

rllib/evaluation/collectors/agent_collector.py

@@ -278,8 +277,7 @@ def add_action_reward_next_obs(self, input_values: Dict[str, TensorType]) -> Non
            AgentCollector._next_unroll_id += 1

        # Next obs -> obs.
-        # TODO @kourosh: remove the in-place operations and get rid of this deepcopy.
-        values = deepcopy(input_values)
+        values = {k: v for k, v in input_values.items()}


just use copy.copy()?

gjoliver · 2023-01-18T23:10:14Z

rllib/connectors/action/pipeline.py

@@ -28,7 +28,6 @@ def __call__(self, ac_data: ActionConnectorDataType) -> ActionConnectorDataType:
            timer = self.timers[str(c)]
            with timer:
                ac_data = c(ac_data)
-            timer.push_units_processed(1)


wait, we shouldn't get rid of these? same below.

I realized we actually don't need these after kourosh asked me about them -> #31693 (comment)

I just executed code from this pr and took the following screenshot just to make sure that the timer actually works as expected.
When calling .mean(), the timer does not care of processed units - we don't need it.

oh that's right, it's part of the with statements.

gjoliver · 2023-01-18T23:41:51Z

rllib/evaluation/collectors/agent_collector.py

+                        # length. This branch takes more time than simply picking
+                        # slices we try to avoid it.
+                        element_at_t = []
+                        for index in inds:


maybe extract this for loop into a small inline function, so the code looks a little better.
the level is nested-ness is a bit nuts :)

Done! Thanks :)

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver · 2023-01-19T05:40:04Z

rllib/connectors/action/pipeline.py

@@ -28,7 +28,6 @@ def __call__(self, ac_data: ActionConnectorDataType) -> ActionConnectorDataType:
            timer = self.timers[str(c)]
            with timer:
                ac_data = c(ac_data)
-            timer.push_units_processed(1)


oh that's right, it's part of the with statements.

Signed-off-by: Artur Niederfahrenhorst <[email protected]> Signed-off-by: Andrea Pisoni <[email protected]>

ArturNiederfahrenhorst added the do-not-merge Do not merge this PR! label Jan 16, 2023

ArturNiederfahrenhorst assigned gjoliver and kouroshHakha Jan 16, 2023

ArturNiederfahrenhorst marked this pull request as ready for review January 17, 2023 00:13

ArturNiederfahrenhorst requested review from sven1977, gjoliver, avnishn, smorad, maxpumperla, kouroshHakha and krfricke as code owners January 17, 2023 00:13

ArturNiederfahrenhorst commented Jan 17, 2023

View reviewed changes

ArturNiederfahrenhorst mentioned this pull request Jan 17, 2023

[RLlib] More workers for Q-Mix's two-step-game regression #31707

Merged

7 tasks

kouroshHakha approved these changes Jan 18, 2023

View reviewed changes

ArturNiederfahrenhorst mentioned this pull request Jan 18, 2023

[RLlib] Revert "Revert "[RLlib] Enable connectors. (#30388)" (#31495)" #31733

Merged

ArturNiederfahrenhorst added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed do-not-merge Do not merge this PR! labels Jan 18, 2023

ArturNiederfahrenhorst added 13 commits January 18, 2023 13:32

initial

5bf1ec1

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

lint

6b7397b

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

fixes

cad2a50

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

ms instead of throughput

017c949

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

initial

2ff4309

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

minor fixes

1367a7e

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

initial

3b6dfac

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

remove deepcopy and squeezes

78be16d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

flip back connectors

6efeb3d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

revert sample_batch_test changes

138c317

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

jun's comments

87e226d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

jun's comment

bb296a5

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

jun's comment

f63be95

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

remove push units calls

60614a1

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

ArturNiederfahrenhorst force-pushed the agent_collector_improvement branch from b85c5da to 60614a1 Compare January 18, 2023 21:35

ArturNiederfahrenhorst removed the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 18, 2023

remove linter error artifact from rebase

9a08d5d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver reviewed Jan 18, 2023

View reviewed changes

jun's comments

28bed76

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver approved these changes Jan 19, 2023

View reviewed changes

gjoliver merged commit fb3c2b5 into ray-project:master Jan 19, 2023

andreapiso pushed a commit to andreapiso/ray that referenced this pull request Jan 22, 2023

[RLlib] Agent collector time complexity reduction (ray-project#31693)

3bd4f1e

Signed-off-by: Artur Niederfahrenhorst <[email protected]> Signed-off-by: Andrea Pisoni <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Agent collector time complexity reduction #31693

[RLlib] Agent collector time complexity reduction #31693

ArturNiederfahrenhorst commented Jan 16, 2023 •

edited

Loading

ArturNiederfahrenhorst Jan 17, 2023

kouroshHakha left a comment

kouroshHakha Jan 17, 2023

ArturNiederfahrenhorst Jan 18, 2023

kouroshHakha Jan 17, 2023

ArturNiederfahrenhorst Jan 18, 2023 •

edited

Loading

gjoliver left a comment

gjoliver Jan 17, 2023

gjoliver Jan 18, 2023

ArturNiederfahrenhorst Jan 18, 2023 •

edited

Loading

gjoliver Jan 19, 2023

gjoliver Jan 18, 2023

ArturNiederfahrenhorst Jan 19, 2023

gjoliver Jan 19, 2023

[RLlib] Agent collector time complexity reduction #31693

[RLlib] Agent collector time complexity reduction #31693

Conversation

ArturNiederfahrenhorst commented Jan 16, 2023 • edited Loading

Why are these changes needed?

Checks

Choose a reason for hiding this comment

kouroshHakha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst commented Jan 16, 2023 •

edited

Loading

ArturNiederfahrenhorst Jan 18, 2023 •

edited

Loading

ArturNiederfahrenhorst Jan 18, 2023 •

edited

Loading