Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Added functionality to add infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. #43496

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Feb 28, 2024

Why are these changes needed?

So far PrioritizedEpisodeReplayBuffer had a functionality to add infos to the sample of this buffer, but not one to add also extra_model_outputs. This PR adds the functionality together with a corresponding test case.

Note, the extra_model_outputs are extracted as a dict and will be added to the batch in this form per row (similar to infos). Later in post-processing the variables from this dicitonary can be extracted in a corresponding learner connector. Furthermore, while infos are extracted at the end of n_step, the extra_model_outputs usually refer to a corresponding action which comes from the first timestep in the n_step tuple. Henceforth, we take the extra_model_outputs from the same timestep.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@simonsays1980 simonsays1980 changed the title Added funcitonality to add infos and'extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. Added funcitonality to add infos andextra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. Mar 1, 2024
@simonsays1980 simonsays1980 changed the title Added funcitonality to add infos andextra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. Added funcitonality to add infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. Mar 1, 2024
@sven1977 sven1977 changed the title Added funcitonality to add infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. [RLlib] Added funcitonality to add infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. Mar 6, 2024
@sven1977 sven1977 marked this pull request as ready for review March 6, 2024 12:42
if include_extra_model_outputs:
ret.update(
{
"extra_model_outputs": np.array(extra_model_outputs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is a good idea just np'ing stuff like this. This often leads to these unwieldy object arrays that have unpredictable behavior (the same is true for np'ing the infos above, we should just keep them as a list of infos-dicts in the returned batch).

We usually separate these sub-columns in extra_model_outputs in our batches. Can we do that here, too?

ret.update(
{
    k: batch(v)
    for k, v in extra_model_outputs.items()
}
)

The final batch (returned from sample) should have columns at the top level, e.g. OBS or ACTION_DIST_INPUTS.
Under each of these columns should be a (possibly nested) struct of numpy array leafs (or simply a numpy array if no complex space/struct). All leafs should have the shape (B, T?, ...), where T might be 0 or 1.

Let me know, if I'm making a thinking-mistake here. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sven1977 thanks for the review! Yes this was somehow still ambiguous how to deal with the extra model outputs. I can batch the items from this field such that each of the keys in extra_model_outputs defines a new column in the batch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sven1977 following your logic above it might also make sense to keep the other "batch" columns here as lists such that they can be batched in a standard way in the connectors?

sven1977 and others added 3 commits March 6, 2024 13:51
… {(eps_id,): [1.3, 4.23 ...], ...}, ...}. Furthermore, implemented a tracker for the maximum tree index to sum weights during sampling faster. Implemented testing for 'sample_with_keys'. Naming was chosen such that we can deprecate the old 'sample' as soon as initial review is done.

Signed-off-by: Simon Zehnder <[email protected]>
…er' of github.com:simonsays1980/ray into extra-model-outputs-for-prioritized-episode-replay-buffer

Signed-off-by: Simon Zehnder <[email protected]>
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's merge this, then time-it, whether the saved time to create the batch in the buffer is eaten up by the additional batching step required in the Learner Connector (I don't think that would be the case).

Awesome PR @simonsays1980 ! :)

@sven1977 sven1977 self-assigned this Mar 11, 2024
@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Mar 11, 2024
@sven1977 sven1977 merged commit ec68337 into ray-project:master Mar 11, 2024
9 checks passed
@sven1977 sven1977 changed the title [RLlib] Added funcitonality to add infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. [RLlib] Added functionality to add infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer. Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants