[RLlib] Properly serialize and restore StateBufferConnector states for policy stashing #31372

gjoliver · 2022-12-30T20:58:41Z

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

kouroshHakha · 2023-01-03T23:07:14Z

rllib/connectors/agent/state_buffer.py

@@ -70,11 +89,18 @@ def transform(self, ac_data: AgentConnectorDataType) -> AgentConnectorDataType:
        return ac_data

    def to_state(self):
-        return StateBufferConnector.__name__, None
+        # Note(jungong) : it is ok to use cloudpickle here for stats because:


The reason to use cloudpickle over pickle is simply that you are pickling a data-structure that contains lambda function, right?

I actually think we should always use cloudpickle.
It's not officially guaranteed, but there is a less chance of not being able to restore states saved with a higher version of python if we use cloudpickle.
cloudpickle kinda does it automatically for you, using the right pickle library if the version is low (pickle5 etc).

kouroshHakha · 2023-01-03T23:08:49Z

rllib/connectors/agent/state_buffer.py

+                # like stashing then restoring a policy during the rollout of
+                # a single episode.
+                # It is ok to ignore the error for most of the cases here.
+                logger.info(


I don't get why you should not error out all the time here? When will you ever pass in a states object in that is ok if it's not unpickled?

when you recover a policy for serving, we wouldn't need the state buffer state.
actually this comment reminded me, I am letting StateBufferConnector clear state whenever someone switch it into eval mode.

In that case we should not pass in the states at all?

maybe we can discuss a bit.
I feel like the only case we need to restore this is when policies are stashed and recovered in the middle of an episode.
we don't need this state even for training as long as we keep the policy in cache throughout an episode.
so maybe a better fix is to ping any "in-use" poclies, and prevent them from being stashed.

kouroshHakha

The PR seems to solve the problem with 100-policy test at least. I wonder if it still works with the old numbers. @gjoliver Can you confirm that?

gjoliver · 2023-01-04T16:32:02Z

The PR seems to solve the problem with 100-policy test at least. I wonder if it still works with the old numbers. @gjoliver Can you confirm that?

yes, it works with original numbers. just very slow right now because we are doing a lot of unnecessary policy stashing.
I will file a separate issue for that.

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Signed-off-by: Jun Gong <[email protected]>

Signed-off-by: Avnish <[email protected]>

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

… restored during the rollout of an single episode. Signed-off-by: Jun Gong <[email protected]>

reduce 100 policy size since we are doing a lot of excessive policy resotring at this point. Signed-off-by: Jun Gong <[email protected]>

Signed-off-by: Jun Gong <[email protected]>

…Group does not need to initiate a collector for every single policies up-front. Signed-off-by: Jun Gong <[email protected]>

Signed-off-by: Jun Gong <[email protected]>

gjoliver · 2023-01-05T03:50:08Z

tests look good finally with and without the flag flip.
gonna merge now.
thanks.

…r policy stashing (#31372) Signed-off-by: Artur Niederfahrenhorst <[email protected]>

…r policy stashing (ray-project#31372) Signed-off-by: Artur Niederfahrenhorst <[email protected]> Signed-off-by: tmynn <[email protected]>

gjoliver requested review from sven1977, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners December 30, 2022 20:58

gjoliver force-pushed the debug_100_policies branch 4 times, most recently from 787ec1c to 3b48398 Compare December 31, 2022 07:48

gjoliver assigned ArturNiederfahrenhorst and kouroshHakha Dec 31, 2022

kouroshHakha reviewed Jan 3, 2023

View reviewed changes

kouroshHakha approved these changes Jan 4, 2023

View reviewed changes

gjoliver force-pushed the debug_100_policies branch from 53ecfa3 to 6a1ac2e Compare January 4, 2023 17:48

ArturNiederfahrenhorst and others added 12 commits January 4, 2023 15:38

initial

24a2414

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

initial

ed930cb

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

disable connectors

3abe2e9

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Fix EnvRunnerV2's handling of soft_horizon episodes.

eae9710

Signed-off-by: Jun Gong <[email protected]>

Enable connectors to see broken tests

2dce34e

Signed-off-by: Avnish <[email protected]>

disable connectors again

6395b4c

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

initial

857538f

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

only flip switch

daa55a6

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

flip

063a462

Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Serialize StateBufferConnector states in case a policy is stashed and…

fd4ad78

… restored during the rollout of an single episode. Signed-off-by: Jun Gong <[email protected]>

revert some debuggin things.

b4b5b11

reduce 100 policy size since we are doing a lot of excessive policy resotring at this point. Signed-off-by: Jun Gong <[email protected]>

check() should handle byte strings

be92e9a

Signed-off-by: Jun Gong <[email protected]>

make sure states are cleared when in eval mode. also _PolicyCollector…

69c67d9

…Group does not need to initiate a collector for every single policies up-front. Signed-off-by: Jun Gong <[email protected]>

gjoliver force-pushed the debug_100_policies branch from 6a1ac2e to 4f35696 Compare January 4, 2023 23:38

fix test_algorithm

40adca3

Signed-off-by: Jun Gong <[email protected]>

gjoliver force-pushed the debug_100_policies branch from 4f35696 to 40adca3 Compare January 4, 2023 23:39

wip

e44b21e

Signed-off-by: Jun Gong <[email protected]>

gjoliver merged commit fba15f6 into ray-project:master Jan 5, 2023

AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023

[RLlib] Properly serialize and restore StateBufferConnector states fo…

c869ea0

…r policy stashing (#31372) Signed-off-by: Artur Niederfahrenhorst <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Properly serialize and restore StateBufferConnector states for policy stashing #31372

[RLlib] Properly serialize and restore StateBufferConnector states for policy stashing #31372

gjoliver commented Dec 30, 2022

kouroshHakha Jan 3, 2023

gjoliver Jan 4, 2023

kouroshHakha Jan 3, 2023

gjoliver Jan 4, 2023 •

edited

Loading

kouroshHakha Jan 4, 2023

gjoliver Jan 4, 2023

kouroshHakha left a comment

gjoliver commented Jan 4, 2023

gjoliver commented Jan 5, 2023

[RLlib] Properly serialize and restore StateBufferConnector states for policy stashing #31372

[RLlib] Properly serialize and restore StateBufferConnector states for policy stashing #31372

Conversation

gjoliver commented Dec 30, 2022

Why are these changes needed?

Related issue number

Checks

kouroshHakha Jan 3, 2023

Choose a reason for hiding this comment

gjoliver Jan 4, 2023

Choose a reason for hiding this comment

kouroshHakha Jan 3, 2023

Choose a reason for hiding this comment

gjoliver Jan 4, 2023 • edited Loading

Choose a reason for hiding this comment

kouroshHakha Jan 4, 2023

Choose a reason for hiding this comment

gjoliver Jan 4, 2023

Choose a reason for hiding this comment

kouroshHakha left a comment

Choose a reason for hiding this comment

gjoliver commented Jan 4, 2023

gjoliver commented Jan 5, 2023

gjoliver Jan 4, 2023 •

edited

Loading