Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] ReplayBuffer API Simple Q #22842

Merged

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

Why are these changes needed?

We need to slowly move the new ReplayBuffer API into the critical path. Starting with Simple Q Learning, this PR moves the MultiAgentPrioritizedReplayBuffer into the critical path and explores what measures to take without impeding other algorithms.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the PR @ArturNiederfahrenhorst , could you address the 3 questions we had on deprecation decorator, PR beta annealing, and while loop (train batch size)?

@sven1977 sven1977 marked this pull request as ready for review March 21, 2022 16:03
@sven1977 sven1977 changed the title ReplayBuffer API Simple Q [RLlib] ReplayBuffer API Simple Q Mar 21, 2022
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 nits

rllib/agents/dqn/dqn.py Outdated Show resolved Hide resolved
"store_buffer_in_checkpoints": False,
# The number of contiguous environment steps to replay at once. This may
# be set to greater than 1 to support recurrent models.
"replay_sequence_length": 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work? I thought data in RB have already been post-processed. So these samples should all have the necessary state inputs for recurrent models?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think state inputs don't live in SampleBatches when they are stored in replay buffers. Recurrent state is passed through the forwad() method of the ModelV2 API and is also initialized by the ModelV2 object via get_initial_state().
This should be taken into consideration on the connector design, right?. @gjoliver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I need to double check the code. it seems like the API for adding a SampleBatch assumes that the batch contains a full episode, and it will slice it up according to replay_sequence_length, and store multiple smaller batches as a result.
am I reading it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I read through everything. our codebase is really a mess.
I believe SampleBatch does carry all the state_in/out columns. if you look at timeslice_along_seq_lens_with_overlap(), it handles the recurrent states correctly.

all those complicated state building logics in Sampler and SimpleListCollector are actually just for rollout. I feel like we should be able to clean up tons of CPU heavy stuff that doesn't do anything today.

btw, if ReplayBuffer is handling the batching of RNN states, how does RNN work for agents like PG that doesn't use ReplayBuffer???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested it out. it simply takes the raw batch with all the state_in and state_out etc.
so still runs fine. 👌

@sven1977
Copy link
Contributor

Pulled latest master to fix LINT error. Waiting for tests to finish, then merge ...

@sven1977 sven1977 merged commit 9a64bd4 into ray-project:master Mar 29, 2022
@ArturNiederfahrenhorst ArturNiederfahrenhorst deleted the ReplayBufferAPI_Simple_Q branch April 24, 2022 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants