[wingman -> rllib] IMPALA MultiDiscrete changes #3967

bjg2 · 2019-02-06T16:24:27Z

NOTE: This is the beginning of the pull request, so we can align. No unit tests were run, nor the changes are tested broader than we need them.

What do these changes do?

Enables the MultiDiscrete actions space in IMPALA.
Adds the MultiCategorical action distribution - we use IMPALA with that distribution - out model outputs categorical logits for each action. Possibly not all action spaces work for all action distributions, tested is that Discrete works with DiagGaussian (default) and that MultiDiscrete works with categorical (MultiCategorical). Note that model actually outputs a 1-dimensional tensor, that is reshaped to logit sizes according to provided MultiDiscrete action space.
VTrace is replaced with an implementation that works for MultiDiscrete action space as well.

AmplabJenkins · 2019-02-06T16:47:59Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11614/
Test FAILed.

ericl · 2019-02-06T21:29:31Z

Do you know what's going on with the formatting changes? It makes it kind of hard to review.

AmplabJenkins · 2019-02-07T10:08:37Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11638/
Test FAILed.

bjg2 · 2019-02-07T11:30:42Z

Do you know what's going on with the formatting changes? It makes it kind of hard to review.

Aligned formatting with yours impala.py, it should be ok to take a look at merge request now.

AmplabJenkins · 2019-02-07T11:39:33Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11643/
Test FAILed.

AmplabJenkins · 2019-02-07T11:45:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11644/
Test FAILed.

AmplabJenkins · 2019-02-07T11:53:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11645/
Test FAILed.

ericl

Since we're changing the vtrace code now, I think it's also good to copy over the vtrace_test file from here: https://github.com/deepmind/scalable_agent/blob/master/vtrace_test.py
and at least get it to run in the normal Discrete() case. It would also be good to add a new test for the MultiDiscrete() spaces.

I would also add an end-to-end runnable script that just verifies MultiDiscrete doesn't crash with a real env (you can add the file to rllib/test/, and add an entry in multi_node_tests.sh)

ericl · 2019-02-07T21:42:59Z

python/ray/rllib/agents/impala/vtrace_policy_graph.py

@@ -101,6 +109,11 @@ def __init__(self,
            "Must use `truncate_episodes` batch mode with V-trace."
        self.config = config
        self.sess = tf.get_default_session()
+        self._is_discrete = False


is_multidiscrete is more clear I think

Agree, fixed.

ericl · 2019-02-07T21:50:04Z

python/ray/rllib/agents/impala/vtrace.py

-      log(target_policy(a) / behaviour_policy(a)). V-trace performs operations
-      on rhos in log-space for numerical stability.
+    rhos: A float32 tensor of shape [T, B, NUM_ACTIONS] representing the
+      importance sampling weights, i.e. target_policy(a) / behaviour_policy(a).


Why change it from log space?

We were experimenting with both log prob and prob spaces, will return it to log space.

ericl · 2019-02-07T21:51:03Z

python/ray/rllib/agents/impala/vtrace.py

-    'vs', 'pg_advantages', 'log_rhos', 'behaviour_action_log_probs',
-    'target_action_log_probs'
+    'vs', 'pg_advantages', 'rhos', 'behaviour_action_policy',
+    'target_action_policy'


Can we keep these in log space?

ericl · 2019-02-07T21:51:12Z

python/ray/rllib/agents/impala/impala.py

@@ -70,6 +70,9 @@
    # max number of workers to broadcast one set of weights to
    "broadcast_interval": 1,

+    # Actions are chosen based on this distribution, if provided
+    "dist_type": None,


This isn't needed right?

It is, as default is DiagGaussian, and we use Categorical (which is MultiCategorical for MultiDiscrete action space). We didn't change the default, which is DiagGaussian and Discrete action space (that's how IMPALA operates at the moment).

I'm a bit confused, isn't it Categorical for discrete spaces?

elif isinstance(action_space, gym.spaces.Discrete): return Categorical, action_space.n

I don't think DiagGaussian ever gets used in IMPALA does it (maybe you meant APPO?)

ericl · 2019-02-07T21:54:11Z

python/ray/rllib/agents/impala/vtrace.py

 ])

 VTraceReturns = collections.namedtuple('VTraceReturns', 'vs pg_advantages')


-def log_probs_from_logits_and_actions(policy_logits, actions):
+def select_policy_values_using_actions(policy_logits, actions):


Should update doc comment here.

ericl · 2019-02-07T21:54:19Z

python/ray/rllib/agents/impala/vtrace.py

@@ -93,10 +92,10 @@ def from_logits(behaviour_policy_logits,
  NUM_ACTIONS refers to the number of actions.

  Args:
-    behaviour_policy_logits: A float32 tensor of shape [T, B, NUM_ACTIONS] with
+    behaviour_policy: A float32 tensor of shape [T, B, NUM_ACTIONS] with


Should update doc comments here.

ericl · 2019-02-07T21:55:48Z

python/ray/rllib/agents/impala/vtrace.py

@@ -33,14 +33,14 @@
 nest = tf.contrib.framework.nest


Should update the file comment to say modified to support MultiDiscrete spaces.

AmplabJenkins · 2019-02-08T11:18:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11688/
Test FAILed.

AmplabJenkins · 2019-02-08T11:30:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11690/
Test FAILed.

* Address vtrace comments * Update get_log_rhos method's comment

AmplabJenkins · 2019-02-08T15:59:53Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11693/
Test FAILed.

AmplabJenkins · 2019-02-19T18:14:15Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12115/
Test FAILed.

AmplabJenkins · 2019-02-20T10:43:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12152/
Test FAILed.

AmplabJenkins · 2019-02-20T18:33:16Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12157/
Test FAILed.

AmplabJenkins · 2019-02-20T19:41:38Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12158/
Test FAILed.

AmplabJenkins · 2019-02-25T11:20:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12293/
Test FAILed.

bjg2 · 2019-02-25T11:21:16Z

Is this pull request waiting on something?

ericl · 2019-02-25T18:48:01Z

Ah looks like test_supported_spaces is failing on APPO somehow. I can take a look in a bit.

AmplabJenkins · 2019-02-26T21:31:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12325/
Test FAILed.

AmplabJenkins · 2019-02-27T11:41:08Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12355/
Test FAILed.

AmplabJenkins · 2019-03-01T23:47:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12446/
Test FAILed.

AmplabJenkins · 2019-03-02T02:09:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12447/
Test FAILed.

ericl · 2019-03-02T03:47:09Z

Merged, thanks!

…3967)" This reverts commit 962b17f.

This reverts commit 962b17f.

…project#3967)" (ray-project#4332)" This reverts commit 3c41cb9.

* Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)" This reverts commit 3c41cb9. * Fix a bug with log rhos for vtrace * Reformat * lint

Aleksandar Milovanović added 2 commits February 6, 2019 17:05

impala changes

a6b1b7a

fixed newlines

088985b

bjg2 changed the title ~~[WINGMAN -> rllib] IMPALA MultiDiscrete changes~~ [wingman -> rllib] IMPALA MultiDiscrete changes Feb 6, 2019

bjg2 mentioned this pull request Feb 6, 2019

Wingman IMPALA changes #3945

Closed

Merge branch 'master' into impala

23029d3

Aleksandar Milovanović added 4 commits February 7, 2019 12:13

reformatting impalla.py

3b38ebb

aligned vtrace.py formatting some more

1858404

aligned formatting some more

9840eb6

aligned formatting some more

e48f9ae

ericl reviewed Feb 7, 2019

View reviewed changes

Aleksandar Milovanović added 2 commits February 8, 2019 11:55

Merge branch 'master' into impala

26eed71

fixed impala stuff

3171c8a

Address vtrace comments (#6)

9d62dd1

* Address vtrace comments * Update get_log_rhos method's comment

ericl self-assigned this Feb 8, 2019

stefanpantic added 6 commits February 11, 2019 13:52

Made APPO work with VTrace

6597295

Variable is no longer a member

1d31991

Optimized imports

252f6b3

Changed is_discrete to is_multidiscrete, fixed KL distribution

5ef2e30

Fixed KL divergence

cf5c1c5

Removed if statement

54f4f79

fixing issue with new gym version

967db5c

Merge branch 'master' into impala

1da470f

ericl added 2 commits February 20, 2019 09:53

lint

9584a7c

fix multigpu test

8999621

Aleksandar Milovanović added 2 commits February 25, 2019 10:49

merged with master

0cbeb7c

Merge branch 'impala' of https://github.com/wingman-ai/ray into impala

abe797a

Merge branch 'master' into impala

280b21c

Merge branch 'master' into impala

e0e3060

ericl added 2 commits March 1, 2019 14:08

Merge remote-tracking branch 'upstream/master' into impala

afb462f

fix tests

eb18cff

ericl merged commit 962b17f into ray-project:master Mar 2, 2019

ericl mentioned this pull request Mar 11, 2019

[rllib] IMPALA can't converge on cluster with Ray 0.6.4 #4329

Closed

ericl added a commit to ericl/ray that referenced this pull request Mar 12, 2019

Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (ray-project#…

e4cb71f

…3967)" This reverts commit 962b17f.

ericl added a commit that referenced this pull request Mar 12, 2019

Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)

3c41cb9

This reverts commit 962b17f.

stefanpantic added a commit to wingman-ai/ray that referenced this pull request Mar 12, 2019

Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (ray-…

a63e581

…project#3967)" (ray-project#4332)" This reverts commit 3c41cb9.

ericl pushed a commit that referenced this pull request Mar 13, 2019

Fix multi discrete (#4338)

2202a81

* Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)" This reverts commit 3c41cb9. * Fix a bug with log rhos for vtrace * Reformat * lint

nikola-j deleted the impala branch March 27, 2019 09:53

[wingman -> rllib] IMPALA MultiDiscrete changes #3967

[wingman -> rllib] IMPALA MultiDiscrete changes #3967

Conversation

bjg2 commented Feb 6, 2019

What do these changes do?

AmplabJenkins commented Feb 6, 2019

ericl commented Feb 6, 2019

AmplabJenkins commented Feb 7, 2019

bjg2 commented Feb 7, 2019

AmplabJenkins commented Feb 7, 2019

AmplabJenkins commented Feb 7, 2019

AmplabJenkins commented Feb 7, 2019

ericl left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjg2 Feb 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Feb 8, 2019

AmplabJenkins commented Feb 8, 2019

AmplabJenkins commented Feb 8, 2019

AmplabJenkins commented Feb 19, 2019

AmplabJenkins commented Feb 20, 2019

AmplabJenkins commented Feb 20, 2019

AmplabJenkins commented Feb 20, 2019

AmplabJenkins commented Feb 25, 2019

bjg2 commented Feb 25, 2019 • edited Loading

ericl commented Feb 25, 2019

AmplabJenkins commented Feb 26, 2019

AmplabJenkins commented Feb 27, 2019

AmplabJenkins commented Mar 1, 2019

AmplabJenkins commented Mar 2, 2019

ericl commented Mar 2, 2019

ericl left a comment •

edited

Loading

bjg2 Feb 8, 2019 •

edited

Loading

bjg2 commented Feb 25, 2019 •

edited

Loading