-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example containing a proposal for computing an adapted (time-dependent) GAE used by the PPO algorithm (via callback on_postprocess_trajectory) #20850
Conversation
…t) GAE used by the PPO algorithm (via callback on_postprocess_trajectory)
@kk-55 Great work! Thank you for contributing! I see in the checks that there are some trailing whitespaces. If you run I guess than all CI tests should pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kk-55 I guess the problem in the CI checks are trailing whitespace in your comments.
rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:2:75: W291 trailing whitespace
--
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:3:77: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:5:76: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:8:78: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:9:69: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:91:68: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:98:69: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:103:76: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:108:76: W291 trailing whitespace
| rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py:109:79: W291 trailing whitespace
| 🚨 Error: The command exited with status 1
My suggestion is to go explicitly to the lines and erase the trailing whitespace manually. It comes usually from writing a space before the next word is typed in and breaking line before the word is written.
Keep up the good work!
Hey @kk-55 , thanks for this PR! Looks great. Some questions:
|
Hey @sven1977, thanks for your response!
Yes, that's absolutely correct! Calculation of advantages is being done twice, first time in
I've actually started to use this time-dependent GAE, but I guess my example/env is too complex and much too large. Perhaps, one could construct a fictive example from the CartPole env, but I'm not sure if this makes sense.
Good idea! I agree if you would like to do it this way.
Also good idea! ;-) I guess you could mention this somewhere in the callbacks section as an example or directly in the examples chapter, but it could also find its place somewhere in the algorithms/PPO section. For me, I see this time-dependent GAE as an opportunity to adapt PPO to problems that are modeled as semi-MDPs rather than MDPs. It's just my experimental work and not verified by any theory or proofs, but John Schulman meant that it looks right to him ;-) |
Hey @kk-55 , this is great and thanks for your detailed answers! |
I've made a proposal for an adapted, time-dependent computation of advantages (GAE, PPO algo).
I've packed my proposal into an example where the advantages are computed in the callback function
on_postprocess_trajectory
, but could also be included directly in apostprocess_fn
.Why are these changes useful and interesting?
See this short document, a review of my proposal is much appreciated (potentially there are shortcomings/issues I've not seen).
Checks
scripts/format.sh
to lint the changes in this PR.