Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Properly serialize and restore StateBufferConnector states for policy stashing #31372

Merged
merged 15 commits into from
Jan 5, 2023

Conversation

gjoliver
Copy link
Member

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -70,11 +89,18 @@ def transform(self, ac_data: AgentConnectorDataType) -> AgentConnectorDataType:
return ac_data

def to_state(self):
return StateBufferConnector.__name__, None
# Note(jungong) : it is ok to use cloudpickle here for stats because:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to use cloudpickle over pickle is simply that you are pickling a data-structure that contains lambda function, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we should always use cloudpickle.
It's not officially guaranteed, but there is a less chance of not being able to restore states saved with a higher version of python if we use cloudpickle.
cloudpickle kinda does it automatically for you, using the right pickle library if the version is low (pickle5 etc).

# like stashing then restoring a policy during the rollout of
# a single episode.
# It is ok to ignore the error for most of the cases here.
logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why you should not error out all the time here? When will you ever pass in a states object in that is ok if it's not unpickled?

Copy link
Member Author

@gjoliver gjoliver Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you recover a policy for serving, we wouldn't need the state buffer state.
actually this comment reminded me, I am letting StateBufferConnector clear state whenever someone switch it into eval mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we should not pass in the states at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can discuss a bit.
I feel like the only case we need to restore this is when policies are stashed and recovered in the middle of an episode.
we don't need this state even for training as long as we keep the policy in cache throughout an episode.
so maybe a better fix is to ping any "in-use" poclies, and prevent them from being stashed.

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR seems to solve the problem with 100-policy test at least. I wonder if it still works with the old numbers. @gjoliver Can you confirm that?

@gjoliver
Copy link
Member Author

gjoliver commented Jan 4, 2023

The PR seems to solve the problem with 100-policy test at least. I wonder if it still works with the old numbers. @gjoliver Can you confirm that?

yes, it works with original numbers. just very slow right now because we are doing a lot of unnecessary policy stashing.
I will file a separate issue for that.

Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
… restored during the rollout of an single episode.

Signed-off-by: Jun Gong <[email protected]>
reduce 100 policy size since we are doing a lot of excessive policy resotring at this point.

Signed-off-by: Jun Gong <[email protected]>
…Group does not need to initiate a collector for every single policies up-front.

Signed-off-by: Jun Gong <[email protected]>
Signed-off-by: Jun Gong <[email protected]>
wip
Signed-off-by: Jun Gong <[email protected]>
@gjoliver
Copy link
Member Author

gjoliver commented Jan 5, 2023

tests look good finally with and without the flag flip.
gonna merge now.
thanks.

@gjoliver gjoliver merged commit fba15f6 into ray-project:master Jan 5, 2023
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
…r policy stashing (#31372)

Signed-off-by: Artur Niederfahrenhorst <[email protected]>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
…r policy stashing (ray-project#31372)

Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: tmynn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants