-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Added functionality to add infos
and extra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.
#43496
Conversation
…atch when sampling from 'PrioritizedEpisodeReplayBuffer'. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…e-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
…e-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
infos
and'extra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.infos
andextra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.
infos
andextra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.infos
and extra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.
…e-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
…e-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
…e-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
infos
and extra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.infos
and extra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.
rllib/utils/replay_buffers/tests/test_prioritized_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
if include_extra_model_outputs: | ||
ret.update( | ||
{ | ||
"extra_model_outputs": np.array(extra_model_outputs), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is a good idea just np'ing stuff like this. This often leads to these unwieldy object arrays that have unpredictable behavior (the same is true for np'ing the infos above, we should just keep them as a list of infos-dicts in the returned batch).
We usually separate these sub-columns in extra_model_outputs
in our batches. Can we do that here, too?
ret.update(
{
k: batch(v)
for k, v in extra_model_outputs.items()
}
)
The final batch (returned from sample
) should have columns at the top level, e.g. OBS or ACTION_DIST_INPUTS.
Under each of these columns should be a (possibly nested) struct of numpy array leafs (or simply a numpy array if no complex space/struct). All leafs should have the shape (B, T?, ...), where T might be 0 or 1.
Let me know, if I'm making a thinking-mistake here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @sven1977 thanks for the review! Yes this was somehow still ambiguous how to deal with the extra model outputs. I can batch the items from this field such that each of the keys in extra_model_outputs
defines a new column in the batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sven1977 following your logic above it might also make sense to keep the other "batch" columns here as lists such that they can be batch
ed in a standard way in the connectors?
Signed-off-by: Sven Mika <[email protected]>
… {(eps_id,): [1.3, 4.23 ...], ...}, ...}. Furthermore, implemented a tracker for the maximum tree index to sum weights during sampling faster. Implemented testing for 'sample_with_keys'. Naming was chosen such that we can deprecate the old 'sample' as soon as initial review is done. Signed-off-by: Simon Zehnder <[email protected]>
…er' of github.com:simonsays1980/ray into extra-model-outputs-for-prioritized-episode-replay-buffer Signed-off-by: Simon Zehnder <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's merge this, then time-it, whether the saved time to create the batch in the buffer is eaten up by the additional batching step required in the Learner Connector (I don't think that would be the case).
Awesome PR @simonsays1980 ! :)
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
infos
and extra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.infos
and extra_model_outputs
to the sample output of PrioritizedEpisodeReplayBuffer
.
Why are these changes needed?
So far
PrioritizedEpisodeReplayBuffer
had a functionality to addinfos
to the sample of this buffer, but not one to add alsoextra_model_outputs
. This PR adds the functionality together with a corresponding test case.Note, the
extra_model_outputs
are extracted as a dict and will be added to the batch in this form per row (similar toinfos
). Later in post-processing the variables from this dicitonary can be extracted in a corresponding learner connector. Furthermore, whileinfos
are extracted at the end ofn_step
, theextra_model_outputs
usually refer to a corresponding action which comes from the first timestep in then_step
tuple. Henceforth, we take theextra_model_outputs
from the same timestep.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.