-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). #19601
[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). #19601
Conversation
Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and Tune tests have been moved to python 3.7
…ce_optional_drop_last
…ce_optional_drop_last
…ce_optional_drop_last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of the diffs shou go away after rebase.
values=values_time_major[:-1], # drop-last=True | ||
rewards=make_time_major(rewards, drop_last=drop_last), | ||
values=values_time_major[:-1] | ||
if drop_last else values_time_major, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation, 4 spaces in front?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not sure. LINTer says it's ok. You mean the if
stuff, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, since it's a continuation of last line. no idea, minor comment.
discounts=tf.cast( | ||
~make_time_major(tf.cast(dones, tf.bool), drop_last=True), | ||
~make_time_major( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's ~make_time_major mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~
means NOT.
make_time_major
transforms a tensor of shape [B, T, ...]
into [T, B, ...]
.
…ce_optional_drop_last # Conflicts: # .buildkite/pipeline.yml # rllib/BUILD # rllib/agents/ddpg/tests/test_ddpg.py # rllib/agents/ddpg/tests/test_td3.py # rllib/agents/dqn/tests/test_dqn.py
Hey @gjoliver , answered your questions and removed some unrelated functionaliy/changes. Could you take another look? Thx! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems to be a bunch of unrelated diffs, maybe rebase.
but this looks good now. thanks.
values=values_time_major[:-1], # drop-last=True | ||
rewards=make_time_major(rewards, drop_last=drop_last), | ||
values=values_time_major[:-1] | ||
if drop_last else values_time_major, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, since it's a continuation of last line. no idea, minor comment.
…ce_optional_drop_last
Add an option to APPO/IMPALA config to not drop the last ts in vtrace calculations.
vtrace_drop_last_ts
(default True).First experiments with sparse reward and reward-at-end environments indicate that not dropping the last ts adds stability to the learning behavior.
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.