[RLlib] ReplayBuffer API Simple Q #22842

ArturNiederfahrenhorst · 2022-03-06T14:48:57Z

Why are these changes needed?

We need to slowly move the new ReplayBuffer API into the critical path. Starting with Simple Q Learning, this PR moves the MultiAgentPrioritizedReplayBuffer into the critical path and explores what measures to take without impeding other algorithms.

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…4807

…methods, docstrings

minors in other buffer classes

…ayBufferAPI_tests

rllib/utils/replay_buffers/__init__.py

rllib/utils/replay_buffers/replay_buffer.py

sven1977

Looks great! Thanks for the PR @ArturNiederfahrenhorst , could you address the 3 questions we had on deprecation decorator, PR beta annealing, and while loop (train batch size)?

rllib/agents/dqn/simple_q.py

gjoliver

2 nits

rllib/agents/dqn/dqn.py

gjoliver · 2022-03-22T07:20:38Z

rllib/agents/dqn/dqn.py

+        "store_buffer_in_checkpoints": False,
+        # The number of contiguous environment steps to replay at once. This may
+        # be set to greater than 1 to support recurrent models.
+        "replay_sequence_length": 1,


how does this work? I thought data in RB have already been post-processed. So these samples should all have the necessary state inputs for recurrent models?

I think state inputs don't live in SampleBatches when they are stored in replay buffers. Recurrent state is passed through the forwad() method of the ModelV2 API and is also initialized by the ModelV2 object via get_initial_state().
This should be taken into consideration on the connector design, right?. @gjoliver

ok, I need to double check the code. it seems like the API for adding a SampleBatch assumes that the batch contains a full episode, and it will slice it up according to replay_sequence_length, and store multiple smaller batches as a result.
am I reading it right?

ok, I read through everything. our codebase is really a mess.
I believe SampleBatch does carry all the state_in/out columns. if you look at timeslice_along_seq_lens_with_overlap(), it handles the recurrent states correctly.

all those complicated state building logics in Sampler and SimpleListCollector are actually just for rollout. I feel like we should be able to clean up tons of CPU heavy stuff that doesn't do anything today.

btw, if ReplayBuffer is handling the batching of RNN states, how does RNN work for agents like PG that doesn't use ReplayBuffer???

tested it out. it simply takes the raw batch with all the state_in and state_out etc.
so still runs fine. 👌

rllib/agents/dqn/simple_q.py

rllib/agents/trainer.py

rllib/utils/replay_buffers/multi_agent_replay_buffer.py

rllib/agents/dqn/simple_q.py

…ayBufferAPI_Simple_Q

…playBufferAPI_Simple_Q

sven1977 · 2022-03-29T11:23:22Z

Pulled latest master to fix LINT error. Waiting for tests to finish, then merge ...

ArturNiederfahrenhorst and others added 30 commits February 3, 2022 12:57

first draft of classes

3d6befc

formatting

28f23d3

added config

16d8d1e

typo

4988b2b

Reservoir buffer sketch and new typehints for sample()

9c45591

wip, https://github.com/ray-project/ray/pull/22114\#discussion_r79971…

15b2e04

…4807

wip https://github.com/ray-project/ray/pull/22114\#discussion_r799724172

2c6daba

added missing docstrings

2d10d74

Partial MixInReplayBuffer rewrite with added get_state and set_state …

83a2dcb

…methods, docstrings

sven's nits

49e75da

wip

9d17c4d

Merge branch 'master' into ReplayBufferAPI_tests

96a4250

jungs TODO from initial ReplayBuffer PR

bfbc354

first bunch of tests

ccacadc

features and fixes that came with first couple of tests

6afc21c

replay buffer and tests done

4e4dbe5

prioritized replay buffer and tests done

95e0ee3

merge from master

f47a0a1

wip

53f9dd8

Apply suggestions from code review

5bd50ad

MultiAgentReplayBuffer and tests

0b64d62

minors in other buffer classes

Merge remote-tracking branch 'origin/ReplayBufferAPI_tests' into Repl…

ee37a85

…ayBufferAPI_tests

MultiAgentReplayBuffer better tests and warning

c6a73e1

Added MultiAgentPrioritizedReplayBuffer and tests

13032ac

minors

3da08fc

multi agent prioritized comments, fixes

bf4a665

multi agent comments, fixes

a7b7c3e

MultiAgentMixInReplayBuffer and tests

90f3eca

Reservoir Buffer and tests

888dca7

wip

0fd7a63

ArturNiederfahrenhorst added 6 commits March 17, 2022 17:43

Merge branch 'master' into ReplayBufferAPI_Simple_Q

ab9d548

use SYNCH_WORKER_WEIGHTS_TIMER

f9c520c

wip

d94c7af

simple Q is learning

0732d8d

Comments and docstrings

1d7ed47

Merge branch 'master' into ReplayBufferAPI_Simple_Q

32e8558

sven1977 reviewed Mar 21, 2022

View reviewed changes

rllib/utils/replay_buffers/__init__.py Show resolved Hide resolved

sven1977 reviewed Mar 21, 2022

View reviewed changes

rllib/utils/replay_buffers/replay_buffer.py Show resolved Hide resolved

sven1977 approved these changes Mar 21, 2022

View reviewed changes

sven1977 reviewed Mar 21, 2022

View reviewed changes

rllib/agents/dqn/simple_q.py Outdated Show resolved Hide resolved

sven1977 marked this pull request as ready for review March 21, 2022 16:03

sven1977 changed the title ~~ReplayBuffer API Simple Q~~ [RLlib] ReplayBuffer API Simple Q Mar 21, 2022

gjoliver reviewed Mar 22, 2022

View reviewed changes

ArturNiederfahrenhorst added 3 commits March 22, 2022 18:39

Sven's comments

bf5cbf1

Merge branch 'master' into ReplayBufferAPI_Simple_Q

a8f3b84

jun's feedback

a2d5851

sven1977 reviewed Mar 23, 2022

View reviewed changes

rllib/agents/dqn/simple_q.py Show resolved Hide resolved

ArturNiederfahrenhorst added 2 commits March 23, 2022 16:44

Sven's nits

9032c5d

Merge branch 'master' into ReplayBufferAPI_Simple_Q

2a832ae

sven1977 reviewed Mar 28, 2022

View reviewed changes

rllib/agents/dqn/simple_q.py Outdated Show resolved Hide resolved

sven1977 reviewed Mar 28, 2022

View reviewed changes

rllib/agents/trainer.py Show resolved Hide resolved

sven1977 reviewed Mar 28, 2022

View reviewed changes

rllib/utils/replay_buffers/multi_agent_replay_buffer.py Show resolved Hide resolved

sven1977 reviewed Mar 28, 2022

View reviewed changes

rllib/agents/dqn/simple_q.py Outdated Show resolved Hide resolved

ArturNiederfahrenhorst and others added 4 commits March 28, 2022 15:09

svents nits

0277416

Sven's comments

bd51df8

Merge branch 'master' of https://github.com/ray-project/ray into Repl…

e4fab6c

…ayBufferAPI_Simple_Q

Merge remote-tracking branch 'artur/ReplayBufferAPI_Simple_Q' into Re…

b8781a5

…playBufferAPI_Simple_Q

sven1977 merged commit 9a64bd4 into ray-project:master Mar 29, 2022

ArturNiederfahrenhorst deleted the ReplayBufferAPI_Simple_Q branch April 24, 2022 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] ReplayBuffer API Simple Q #22842

[RLlib] ReplayBuffer API Simple Q #22842

ArturNiederfahrenhorst commented Mar 6, 2022

sven1977 left a comment •

edited

Loading

gjoliver left a comment

gjoliver Mar 22, 2022

ArturNiederfahrenhorst Mar 23, 2022

gjoliver Mar 23, 2022

ArturNiederfahrenhorst Mar 23, 2022

gjoliver Mar 25, 2022

gjoliver Mar 27, 2022

sven1977 commented Mar 29, 2022

[RLlib] ReplayBuffer API Simple Q #22842

[RLlib] ReplayBuffer API Simple Q #22842

Conversation

ArturNiederfahrenhorst commented Mar 6, 2022

Why are these changes needed?

Checks

sven1977 left a comment • edited Loading

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

gjoliver Mar 22, 2022

Choose a reason for hiding this comment

ArturNiederfahrenhorst Mar 23, 2022

Choose a reason for hiding this comment

gjoliver Mar 23, 2022

Choose a reason for hiding this comment

ArturNiederfahrenhorst Mar 23, 2022

Choose a reason for hiding this comment

gjoliver Mar 25, 2022

Choose a reason for hiding this comment

gjoliver Mar 27, 2022

Choose a reason for hiding this comment

sven1977 commented Mar 29, 2022

sven1977 left a comment •

edited

Loading