Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] ReplayBuffer API Simple Q #22842
[RLlib] ReplayBuffer API Simple Q #22842
Changes from all commits
3d6befc
28f23d3
16d8d1e
4988b2b
9c45591
15b2e04
2c6daba
2d10d74
83a2dcb
49e75da
9d17c4d
96a4250
bfbc354
ccacadc
6afc21c
4e4dbe5
95e0ee3
f47a0a1
53f9dd8
5bd50ad
0b64d62
ee37a85
c6a73e1
13032ac
3da08fc
bf4a665
a7b7c3e
90f3eca
888dca7
0fd7a63
db98ef3
2ac5916
23f7122
a870ad0
3537e2b
98abf64
21c4b4b
7a8d0f3
85aaaad
aeae356
1d18245
e57ce01
40dfcac
e7b0ace
56276c5
1006364
7118f8e
9d84ec0
3b6dd74
7f7b602
cf9fd38
67265d4
da50351
2c3a0d1
7c837d9
3cf1405
ed8e5d1
cd92285
10772f4
d93c217
01d356d
ae0d2ac
aa3336d
d69ca75
b130323
2d8cf0c
b6f1620
7645541
c8d85e4
ef042a6
ed606fd
193f94f
d3060f9
8a678a9
84e1a84
b1fd2db
57f1903
7a618b8
4fc1cdf
5bc7fc2
b67c7ce
ab9d548
f9c520c
d94c7af
0732d8d
1d7ed47
32e8558
bf5cbf1
a8f3b84
a2d5851
9032c5d
2a832ae
0277416
bd51df8
e4fab6c
b8781a5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this work? I thought data in RB have already been post-processed. So these samples should all have the necessary state inputs for recurrent models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think state inputs don't live in SampleBatches when they are stored in replay buffers. Recurrent state is passed through the forwad() method of the ModelV2 API and is also initialized by the ModelV2 object via get_initial_state().
This should be taken into consideration on the connector design, right?. @gjoliver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I need to double check the code. it seems like the API for adding a SampleBatch assumes that the batch contains a full episode, and it will slice it up according to replay_sequence_length, and store multiple smaller batches as a result.
am I reading it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I read through everything. our codebase is really a mess.
I believe SampleBatch does carry all the state_in/out columns. if you look at timeslice_along_seq_lens_with_overlap(), it handles the recurrent states correctly.
all those complicated state building logics in Sampler and SimpleListCollector are actually just for rollout. I feel like we should be able to clean up tons of CPU heavy stuff that doesn't do anything today.
btw, if ReplayBuffer is handling the batching of RNN states, how does RNN work for agents like PG that doesn't use ReplayBuffer???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested it out. it simply takes the raw batch with all the state_in and state_out etc.
so still runs fine. 👌