[RLlib] Add filters to connector pipeline #27864

ArturNiederfahrenhorst · 2022-08-15T11:35:21Z

Why are these changes needed?

In our efforts to include our diverse experience processing steps into the connector pipeline, this PR includes filters.
As long as connectors can be switched off, we have to support the old and new place for filters and this PR thus tried to find a path that enables both by still indexing the filters alt RolloutWorker().policy_map[<policy_id>].filter and updating them there, while instantiating them in the AgentConnectors of a policy.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

… connector context Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver

thanks a lot for tackling this biggest work item on my connector list.
a bunch of high level comments first. happy to discuss a bit. we should probably spend a little extra effort to get these stateful connectors done right.
can you point me to where we sync these stateful filter connectors?

gjoliver · 2022-08-29T19:03:57Z

rllib/connectors/agent/mean_std_filter.py

+
+    def to_state_dict(self):
+        return MeanStdObservationFilterAgentConnector.__name__, {
+            "filter": self.filter,


can we actually record the state of self.filter so it can be re-constructed without pickle?

gjoliver · 2022-08-29T19:10:36Z

rllib/connectors/agent/mean_std_filter.py

+
+    @staticmethod
+    def from_state_dict(ctx: ConnectorContext, params: List[Any]):
+        connector = MeanStdObservationFilterAgentConnector(ctx)


ideally, MeanStdObservationFilterAgentConnector should take demean, destd, clip, and a state dict as input. if state dict is not there, it initializes from fresh state, otherwise, resume from the existing state.
does this make sense?

gjoliver · 2022-08-29T19:17:12Z

rllib/connectors/agent/mean_std_filter.py

+        assert all(ctx.observation_space.shape == connector.filter.shape)
+        return connector
+
+    def reset_state(self) -> None:


when do we usually need this?

Right now the reset_state() method is part of the SyncedFilterAgentConnector interface.
Synchronization is done via the old filter synchronization mechanism, which I would like to leave in place until we switch connectors on by default. After that, I would like to simply inline all the filter code and call the connector's reset_state method directly.

Until we have gotten that far, I can delete this method if you like.

gjoliver · 2022-08-29T19:18:37Z

rllib/connectors/agent/mean_std_filter.py

+        # env_runner
+        if not self._is_training:
+            raise ValueError(
+                "Changes can only be applied to {} when trainin.".format(self.__name__)


wait, why can't we update during inference as well?

I asked @kouroshHakha if you update mean-std-filters during deployment because I was wondering the same thing. He told me that it is best practice to stop after training.

gjoliver · 2022-08-29T19:19:04Z

rllib/connectors/agent/mean_std_filter.py

+        """Copies all state from other filter to self."""
+        # inline this as soon as we deprecate ordinary filter with non-connector
+        # env_runner
+        if not self._is_training:


Let's keep the discussion in your first comment!

gjoliver · 2022-08-29T19:19:59Z

rllib/connectors/agent/mean_std_filter.py

+            raise ValueError(
+                "{} can only be synced when trainin.".format(self.__name__)
+            )
+        return self.filter.sync(other)


other is a connector, is it better to do self.filter.sync(other.filter) here, so filter doesn't need to be aware of connector?

Yes. This method is so far not used, otherwise, this would have thrown an error. Sorry. Just realized that this is not developer API anymore and I should probably keep unused/untested code completely out of it.

oh ok, so the connectors are not synced at this point?

They are synchronized, but through the old mechanism.
I think as long as we have to have both, filters in their old place and in connectors, it's best to keep the same mechanism.

gjoliver · 2022-08-29T19:21:14Z

rllib/connectors/agent/mean_std_filter.py

+    MeanStdObservationFilterAgentConnector.__name__,
+    MeanStdObservationFilterAgentConnector,
+)
+register_connector(


can you add some comments here describing the difference between these 2 filter connectors?
thanks.

gjoliver · 2022-08-29T19:24:10Z

rllib/evaluation/rollout_worker.py

-            self.filters[policy_id] = get_filter(self.observation_filter, filter_shape)
+            if policy_config.get("enable_connectors"):
+                ctx = ConnectorContext.from_policy(policy)
+                connector = get_synced_filter_connector(


do we really need to do this here?
can we just call get_synced_filter_connector() in get_agent_connectors_from_config()?
https://github.com/ray-project/ray/blob/master/rllib/connectors/util.py#L25

any input to this question I had?
just wondering if we can keep things simpler and not bring connectors up to the rollout worker level.

In the end it's definitly cleaner to move this. But right now this code block still depends on self.observation_filter which is ugly to remove or sneak into get_agent_connectors_from_config(). I think we should leave this here for the moment. When connectors are switched on by default, we can refactor a little bit. This whole PR is designed in the way that the connectors solution and the old solution share lots of code and I would like to do another PR that not only remove the other solution, but also makes the connectors solution more elegant.

ok get it.
I am ok with accessing policy.agent_connectors here to find the filter agent, and save a reference to its filter object. as long as we add some comments explaining it :)
but can we construct the filter agent in get_agent_connectors_from_config()? there is no reason to create the filter agent here and append it right?
we can also add a get(self, name: str) API to ConnectorPipeline so it's easier to check/find a specific connector.

the reason I am hoping to create all the connectors in a centralized place is that we actually print the complete connector setup in get_agent_connectors_from_config(). if we are gonna add things somewhere else, this message is not correct anymore.

I understand. I've implemented this. Let's see if tests pass, I might have to tinker a little more.

Moved this as you requested!

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

…ilterstoconnectors

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver

almost there! thanks a lot!

gjoliver · 2022-09-28T23:22:11Z

rllib/evaluation/rollout_worker.py

+            # place, we need to put filters into self.filters so that they get
+            # synchronized
+            filter_connectors = self.policy_map[policy_id].agent_connectors[
+                SyncedFilterAgentConnector


What about the concurrent version of the filter?

I don't think that has practical relevance but is just used for testing (ConcurrentMeanStdFilter is only used in test_rollout_worker and I can't think of a reason why it should be used elsewhere).
So my idea was rewrite that test to use the SyncedFilterAgentConnector instead of the ConcurrentMeanStdFilter after we make the switch and eliminate ConcurrentMeanStdFilter altogether.

-> Better get rid of ConcurrentMeanStdFilter because it's just used only for testing and instead test RolloutWorker as a ray actor. Does that make sense?

ok, can you add a Note or TODO here, saying that nobody should use ConcurrentMeanStdFilter at this point, since we will remove it?

Created a deprecation warning

gjoliver · 2022-09-28T23:23:01Z

rllib/evaluation/rollout_worker.py

+                # place, we need to put filters into self.filters so that they get
+                # synchronized
+                filter_connectors = self.policy_map[name].agent_connectors[
+                    SyncedFilterAgentConnector


Replied above.

gjoliver · 2022-09-28T23:24:53Z

rllib/evaluation/rollout_worker.py

@@ -1258,6 +1265,8 @@ def add_policy(
                )
            }

+        connectors_enabled = merged_config.get("enable_connectors", False)


can you move this down to right above where it's first used?

gjoliver · 2022-09-28T23:28:54Z

rllib/evaluation/rollout_worker.py


-        self.filters[policy_id] = get_filter(self.observation_filter, filter_shape)
+        if connectors_enabled and policy_id in self.policy_map:
+            create_connectors_for_policy(self.policy_map[policy_id], self.policy_config)


I feel like we shouldn't call create_connectors_for_policy() here, it's actually called right below, about 20 lines down.
maybe the high level flow should be:

if connector_enabled: create_connectors_for_policy(...) ... setup_filters_if_necessary()

where setup_filters_if_necessary() will do:

def setup_filters_if_necessary(self): if connectors_enabled: self.filters[policy_id] = <try to see if we have a filter connector in the agent list> else: self.filters[policy_id] = <set up a new filter>

wdyt?

Done. Like we discussed privately, we now try to create connectors for the added policy and fail with an assertion error if this policy already has connectors.

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

…ntuitive order Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver

thanks super close now.

gjoliver · 2022-10-03T20:20:51Z

rllib/evaluation/rollout_worker.py

+            # place, we need to put filters into self.filters so that they get
+            # synchronized
+            filter_connectors = self.policy_map[policy_id].agent_connectors[
+                SyncedFilterAgentConnector


ok, can you add a Note or TODO here, saying that nobody should use ConcurrentMeanStdFilter at this point, since we will remove it?

gjoliver · 2022-10-03T20:25:12Z

rllib/evaluation/rollout_worker.py

+                # As long as the historic filter synchronization mechanism is in
+                # place, we need to put filters into self.filters so that they get
+                # synchronized
+                filter_connectors = self.policy_map[name].agent_connectors[


will it be worth it to create a small local util function for this logic, so we don't duplicate it twice here and above?
also it will make the high level code look nicer:

if connectors_enabled: policy = self.policy_map[name] create_connectors_for_policy(policy) maybe_get_filters_for_syncing(policy)

hopefully the long-ish if-else block above will be simplified too.

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver

thanks a ton for the updates.
tests also look good.
merging.

* initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * lint and comments Signed-off-by: Artur Niederfahrenhorst <[email protected]> * wip * format Signed-off-by: Artur Niederfahrenhorst <[email protected]> * implements and uses filters, working for ppo cartpole with meanstd Signed-off-by: Artur Niederfahrenhorst <[email protected]> * get rid of synced custom filter abstraction Signed-off-by: Artur Niederfahrenhorst <[email protected]> * add meanstd filter connetor test and minor fixes Signed-off-by: Artur Niederfahrenhorst <[email protected]> * jun's comment Signed-off-by: Artur Niederfahrenhorst <[email protected]> * move observation space struct logic because information is already in connector context Signed-off-by: Artur Niederfahrenhorst <[email protected]> * fix docstrings Signed-off-by: Artur Niederfahrenhorst <[email protected]> * minor fixes Signed-off-by: Artur Niederfahrenhorst <[email protected]> * initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * fix for config=None in add_policy and connector=None Signed-off-by: Artur Niederfahrenhorst <[email protected]> * fix config name Signed-off-by: Artur Niederfahrenhorst <[email protected]> * filter connector state is now json serializable Signed-off-by: Artur Niederfahrenhorst <[email protected]> * jun's comments Signed-off-by: Artur Niederfahrenhorst <[email protected]> * initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * create connectors only in create_connectors_for_policy Signed-off-by: Artur Niederfahrenhorst <[email protected]> * initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * get filter connector by __get__item Signed-off-by: Artur Niederfahrenhorst <[email protected]> * remove observation filters Signed-off-by: Artur Niederfahrenhorst <[email protected]> * minor fixes Signed-off-by: Artur Niederfahrenhorst <[email protected]> * initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * revert spelling error Signed-off-by: Artur Niederfahrenhorst <[email protected]> * initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * Revert "Merge branch 'make_add_policy_config_explicit' into filterstoconnectors" This reverts commit 06beebc, reversing changes made to e637f4f. * accomodate case in which config={} in add_policy Signed-off-by: Artur Niederfahrenhorst <[email protected]> * fix connectors enabled not no SyncedFilterAgentConnector case Signed-off-by: Artur Niederfahrenhorst <[email protected]> * initial Signed-off-by: Artur Niederfahrenhorst <[email protected]> * merge configs in add_policy Signed-off-by: Artur Niederfahrenhorst <[email protected]> * format Signed-off-by: Artur Niederfahrenhorst <[email protected]> * revert random cloudpickle linter error Signed-off-by: Artur Niederfahrenhorst <[email protected]> * small change to trigger CI Signed-off-by: Artur Niederfahrenhorst <[email protected]> * remove all random changes outside rllib that made it into this PR Signed-off-by: Artur Niederfahrenhorst <[email protected]> * remove random rst Signed-off-by: Artur Niederfahrenhorst <[email protected]> * fix deprecated is_training call Signed-off-by: Artur Niederfahrenhorst <[email protected]> * correct in_eval call Signed-off-by: Artur Niederfahrenhorst <[email protected]> * nit Signed-off-by: Artur Niederfahrenhorst <[email protected]> * jun's comment Signed-off-by: Artur Niederfahrenhorst <[email protected]> * use merged config to create connectors Signed-off-by: Artur Niederfahrenhorst <[email protected]> * Add meaningful assertion error and switch order of if/else block to intuitive order Signed-off-by: Artur Niederfahrenhorst <[email protected]> * better warning Signed-off-by: Artur Niederfahrenhorst <[email protected]> * jun's comments Signed-off-by: Artur Niederfahrenhorst <[email protected]> * shorter function signature for helper fn Signed-off-by: Artur Niederfahrenhorst <[email protected]> Signed-off-by: Artur Niederfahrenhorst <[email protected]> Signed-off-by: Weichen Xu <[email protected]>

ArturNiederfahrenhorst added 7 commits August 11, 2022 21:50

initial

5680d70

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

initial

4123ac5

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

lint and comments

c62faf2

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Merge branch 'connectorsstates' into filterstoconnectors

12d849f

wip

f1d4eec

format

1505c28

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

implements and uses filters, working for ppo cartpole with meanstd

af3f6de

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

ArturNiederfahrenhorst requested review from sven1977, gjoliver, avnishn, smorad, maxpumperla, kouroshHakha and krfricke as code owners August 15, 2022 11:35

ArturNiederfahrenhorst added 14 commits August 26, 2022 08:49

merge master

4ddcdeb

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

get rid of synced custom filter abstraction

6c32319

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

add meanstd filter connetor test and minor fixes

26dbe68

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

jun's comment

481fcdf

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

move observation space struct logic because information is already in…

6119961

… connector context Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Merge branch 'connectorsstates' into filterstoconnectors

ab0bba9

fix docstrings

0c3460d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Merge branch 'connectorsstates' into filterstoconnectors

ec262a2

minor fixes

ad0cc88

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

initial

9da99cb

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge minor add_policy_fix

0adc85a

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

fix for config=None in add_policy and connector=None

71f4fe2

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

fix config name

6050961

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge master

6eb34d6

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver reviewed Aug 29, 2022

View reviewed changes

filter connector state is now json serializable

4ae9758

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

ArturNiederfahrenhorst added 14 commits September 23, 2022 18:00

initial

45b576e

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge configs in add_policy

a5eb07d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

format

26dbf8b

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge connectors from own config fix

de68a44

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge master

9cb9ebd

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge master

555b48d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge from 28739

cf00366

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

revert random cloudpickle linter error

25db0b1

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Merge branch 'create_connectors_from_own_config_in_add_policy' into f…

3665588

…ilterstoconnectors

small change to trigger CI

8d5fc3f

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

remove all random changes outside rllib that made it into this PR

070cd91

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

remove random rst

a9ecae8

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

fix deprecated is_training call

9200021

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

correct in_eval call

aa9c59a

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver reviewed Sep 28, 2022

View reviewed changes

ArturNiederfahrenhorst added 7 commits September 30, 2022 19:11

nit

a644bb4

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge master

c06a63a

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

jun's comment

40c2264

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

use merged config to create connectors

60ca25d

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

merge master

d166ac4

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Add meaningful assertion error and switch order of if/else block to i…

ead68d3

…ntuitive order Signed-off-by: Artur Niederfahrenhorst <[email protected]>

better warning

6492024

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver reviewed Oct 3, 2022

View reviewed changes

ArturNiederfahrenhorst added 2 commits October 3, 2022 22:51

jun's comments

71038d3

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

shorter function signature for helper fn

8e4b8dd

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

gjoliver approved these changes Oct 3, 2022

View reviewed changes

gjoliver merged commit e339298 into ray-project:master Oct 3, 2022

ArturNiederfahrenhorst mentioned this pull request Dec 1, 2022

[RLlib] Add backward compatibility to MeanStdFilter to restore from older checkpoints. #30439

Merged

7 tasks

ArturNiederfahrenhorst deleted the filterstoconnectors branch January 5, 2023 15:35

[RLlib] Add filters to connector pipeline #27864

[RLlib] Add filters to connector pipeline #27864

Conversation

ArturNiederfahrenhorst commented Aug 15, 2022 • edited Loading

Why are these changes needed?

Checks

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst Aug 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst commented Aug 15, 2022 •

edited

Loading

ArturNiederfahrenhorst Aug 30, 2022 •

edited

Loading