[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). #19601

sven1977 · 2021-10-21T19:46:33Z

Add an option to APPO/IMPALA config to not drop the last ts in vtrace calculations.

vtrace_drop_last_ts (default True).

First experiments with sparse reward and reward-at-end environments indicate that not dropping the last ts adds stability to the learning behavior.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and Tune tests have been moved to python 3.7

…n gym's side.

…ce_optional_drop_last

…ade_gym

…l_drop_last

gjoliver

a lot of the diffs shou go away after rebase.

gjoliver · 2021-10-25T21:49:26Z

rllib/agents/ppo/appo_tf_policy.py

-                values=values_time_major[:-1],  # drop-last=True
+                rewards=make_time_major(rewards, drop_last=drop_last),
+                values=values_time_major[:-1]
+                if drop_last else values_time_major,


indentation, 4 spaces in front?

Yeah, not sure. LINTer says it's ok. You mean the if stuff, right?

yeah, since it's a continuation of last line. no idea, minor comment.

gjoliver · 2021-10-25T21:50:17Z

rllib/agents/ppo/appo_tf_policy.py

                discounts=tf.cast(
-                    ~make_time_major(tf.cast(dones, tf.bool), drop_last=True),
+                    ~make_time_major(


what's ~make_time_major mean?

~ means NOT.
make_time_major transforms a tensor of shape [B, T, ...] into [T, B, ...].

…ce_optional_drop_last # Conflicts: # .buildkite/pipeline.yml # rllib/BUILD # rllib/agents/ddpg/tests/test_ddpg.py # rllib/agents/ddpg/tests/test_td3.py # rllib/agents/dqn/tests/test_dqn.py

sven1977 · 2021-11-02T19:09:38Z

Hey @gjoliver , answered your questions and removed some unrelated functionaliy/changes. Could you take another look? Thx! :)

gjoliver

there seems to be a bunch of unrelated diffs, maybe rebase.
but this looks good now. thanks.

gjoliver · 2021-11-03T06:12:31Z

rllib/agents/ppo/appo_tf_policy.py

-                values=values_time_major[:-1],  # drop-last=True
+                rewards=make_time_major(rewards, drop_last=drop_last),
+                values=values_time_major[:-1]
+                if drop_last else values_time_major,


yeah, since it's a continuation of last line. no idea, minor comment.

…ce_optional_drop_last

sven1977 and others added 17 commits October 21, 2021 21:01

wip.

b3b82c4

wip.

dc4efee

wip.

6ab2be7

wip.

d8eec93

wip.

9ef09af

tst change to trigger re-test of FrozenLake env, which seems broken o…

7d3dba0

…n gym's side.

wip.

89251a6

Merge branch 'master' of https://github.com/ray-project/ray into vtra…

0f771e4

…ce_optional_drop_last

Merge branch 'master' of https://github.com/ray-project/ray into vtra…

1f0b012

…ce_optional_drop_last

Merge branch 'master' of https://github.com/ray-project/ray into vtra…

cbb1e91

…ce_optional_drop_last

merge

6d0d412

wip

e49839f

wip

662e92a

Merge branch 'master' of https://github.com/ray-project/ray into upgr…

684e69a

…ade_gym

wip

d97b5b4

fix.

8a75c74

sven1977 requested a review from gjoliver October 25, 2021 16:52

sven1977 assigned gjoliver Oct 25, 2021

Merge remote-tracking branch 'avnish/upgrade_gym' into vtrace_optiona…

9300105

…l_drop_last

gjoliver reviewed Oct 25, 2021

View reviewed changes

sven1977 added 2 commits November 2, 2021 19:56

Merge branch 'master' of https://github.com/ray-project/ray into vtra…

fb7a5ff

…ce_optional_drop_last # Conflicts: # .buildkite/pipeline.yml # rllib/BUILD # rllib/agents/ddpg/tests/test_ddpg.py # rllib/agents/ddpg/tests/test_td3.py # rllib/agents/dqn/tests/test_dqn.py

wip.

fa605fd

gjoliver approved these changes Nov 3, 2021

View reviewed changes

sven1977 added 3 commits November 3, 2021 07:50

Merge branch 'master' of https://github.com/ray-project/ray into vtra…

df7ab2f

…ce_optional_drop_last

wip.

607af55

wip.

053c7e0

sven1977 merged commit e6ae08f into ray-project:master Nov 3, 2021

sven1977 deleted the vtrace_optional_drop_last branch June 2, 2023 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). #19601

[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). #19601

sven1977 commented Oct 21, 2021 •

edited

Loading

gjoliver left a comment

gjoliver Oct 25, 2021

sven1977 Nov 2, 2021

gjoliver Nov 3, 2021

gjoliver Oct 25, 2021

sven1977 Nov 2, 2021

sven1977 commented Nov 2, 2021

gjoliver left a comment

gjoliver Nov 3, 2021

[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). #19601

[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). #19601

Conversation

sven1977 commented Oct 21, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

gjoliver Oct 25, 2021

Choose a reason for hiding this comment

sven1977 Nov 2, 2021

Choose a reason for hiding this comment

gjoliver Nov 3, 2021

Choose a reason for hiding this comment

gjoliver Oct 25, 2021

Choose a reason for hiding this comment

sven1977 Nov 2, 2021

Choose a reason for hiding this comment

sven1977 commented Nov 2, 2021

gjoliver left a comment

Choose a reason for hiding this comment

gjoliver Nov 3, 2021

Choose a reason for hiding this comment

sven1977 commented Oct 21, 2021 •

edited

Loading