-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Offline RL] Add support to directly read from episodes. #46865
[RLlib; Offline RL] Add support to directly read from episodes. #46865
Conversation
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…matted commit merely for securing the work. Signed-off-by: simonsays1980 <[email protected]>
…' and 'MARWILTorchPolicy', fixed imports and tested MARWIL on non-recurrent policies. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…unction. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
… to 'OfflineData'. Set return to reach higher for tuned example. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
… in linting and building. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…nectors request finalized episodes. Signed-off-by: simonsays1980 <[email protected]>
…g as this was giving an error when 'MARWILOfflinePreLearner' tried to call a value function unneeded by BC. Deprecated hybrid stack. Signed-off-by: simonsays1980 <[email protected]>
…tting 'beta=0.0'. Signed-off-by: simonsays1980 <[email protected]>
…. BC depends now fully on MARWIL. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
# TODO (simon): episodes are only needed for logging here. | ||
return {"batch": [batch]} | ||
|
||
def _compute_gae_from_episodes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can share this code with PPO's?
Create a utility function in PPO folder (e.g. algorithms/ppo/utils.py), then import and use this same function for MARWIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the question I have: DO we really need the complete code? Or can we get rid of either the batch or episodes in their when considering MARWIL and the offline learning pipeline?
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…epcrecated. Moved to old stack as it uses policies. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…he learner from MARWIL. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now! Thanks for this PR @simonsays1980 !
Signed-off-by: simonsays1980 <[email protected]>
Why are these changes needed?
The Offline RL API in the new stack will soon support recording capabilities (see #46818) that allow direclty storing
RLlib
's episodes (i.e.SingleAgentEpisode
s). In regard to this upcoming change this PR proposes a direct episode readfing option for theOfflinePreLearner
that can skip the_map_to_episodes
step in themap_batches
dta pipeline and therefore increase training speed.Related issue number
Relates to #46818
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.