Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wingman -> rllib] IMPALA MultiDiscrete changes #3967

Merged
merged 48 commits into from
Mar 2, 2019

Conversation

bjg2
Copy link
Contributor

@bjg2 bjg2 commented Feb 6, 2019

NOTE: This is the beginning of the pull request, so we can align. No unit tests were run, nor the changes are tested broader than we need them.

What do these changes do?

  • Enables the MultiDiscrete actions space in IMPALA.

  • Adds the MultiCategorical action distribution - we use IMPALA with that distribution - out model outputs categorical logits for each action. Possibly not all action spaces work for all action distributions, tested is that Discrete works with DiagGaussian (default) and that MultiDiscrete works with categorical (MultiCategorical). Note that model actually outputs a 1-dimensional tensor, that is reshaped to logit sizes according to provided MultiDiscrete action space.

  • VTrace is replaced with an implementation that works for MultiDiscrete action space as well.

Aleksandar Milovanović added 2 commits February 6, 2019 17:05
@bjg2 bjg2 changed the title [WINGMAN -> rllib] IMPALA MultiDiscrete changes [wingman -> rllib] IMPALA MultiDiscrete changes Feb 6, 2019
@bjg2 bjg2 mentioned this pull request Feb 6, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11614/
Test FAILed.

@ericl
Copy link
Contributor

ericl commented Feb 6, 2019

Do you know what's going on with the formatting changes? It makes it kind of hard to review.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11638/
Test FAILed.

@bjg2
Copy link
Contributor Author

bjg2 commented Feb 7, 2019

Do you know what's going on with the formatting changes? It makes it kind of hard to review.

Aligned formatting with yours impala.py, it should be ok to take a look at merge request now.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11643/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11644/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11645/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're changing the vtrace code now, I think it's also good to copy over the vtrace_test file from here: https://github.com/deepmind/scalable_agent/blob/master/vtrace_test.py
and at least get it to run in the normal Discrete() case. It would also be good to add a new test for the MultiDiscrete() spaces.

I would also add an end-to-end runnable script that just verifies MultiDiscrete doesn't crash with a real env (you can add the file to rllib/test/, and add an entry in multi_node_tests.sh)

@@ -101,6 +109,11 @@ def __init__(self,
"Must use `truncate_episodes` batch mode with V-trace."
self.config = config
self.sess = tf.get_default_session()
self._is_discrete = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_multidiscrete is more clear I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, fixed.

log(target_policy(a) / behaviour_policy(a)). V-trace performs operations
on rhos in log-space for numerical stability.
rhos: A float32 tensor of shape [T, B, NUM_ACTIONS] representing the
importance sampling weights, i.e. target_policy(a) / behaviour_policy(a).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change it from log space?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were experimenting with both log prob and prob spaces, will return it to log space.

'vs', 'pg_advantages', 'log_rhos', 'behaviour_action_log_probs',
'target_action_log_probs'
'vs', 'pg_advantages', 'rhos', 'behaviour_action_policy',
'target_action_policy'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep these in log space?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

@@ -70,6 +70,9 @@
# max number of workers to broadcast one set of weights to
"broadcast_interval": 1,

# Actions are chosen based on this distribution, if provided
"dist_type": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed right?

Copy link
Contributor Author

@bjg2 bjg2 Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, as default is DiagGaussian, and we use Categorical (which is MultiCategorical for MultiDiscrete action space). We didn't change the default, which is DiagGaussian and Discrete action space (that's how IMPALA operates at the moment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused, isn't it Categorical for discrete spaces?

        elif isinstance(action_space, gym.spaces.Discrete):
            return Categorical, action_space.n

I don't think DiagGaussian ever gets used in IMPALA does it (maybe you meant APPO?)

])

VTraceReturns = collections.namedtuple('VTraceReturns', 'vs pg_advantages')


def log_probs_from_logits_and_actions(policy_logits, actions):
def select_policy_values_using_actions(policy_logits, actions):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update doc comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -93,10 +92,10 @@ def from_logits(behaviour_policy_logits,
NUM_ACTIONS refers to the number of actions.

Args:
behaviour_policy_logits: A float32 tensor of shape [T, B, NUM_ACTIONS] with
behaviour_policy: A float32 tensor of shape [T, B, NUM_ACTIONS] with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update doc comments here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -33,14 +33,14 @@
nest = tf.contrib.framework.nest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update the file comment to say modified to support MultiDiscrete spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11688/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11690/
Test FAILed.

* Address vtrace comments

* Update get_log_rhos method's comment
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11693/
Test FAILed.

@ericl ericl self-assigned this Feb 8, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12115/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12152/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12157/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12158/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12293/
Test FAILed.

@bjg2
Copy link
Contributor Author

bjg2 commented Feb 25, 2019

Is this pull request waiting on something?

@ericl
Copy link
Contributor

ericl commented Feb 25, 2019

Ah looks like test_supported_spaces is failing on APPO somehow. I can take a look in a bit.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12325/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12355/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12446/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/12447/
Test FAILed.

@ericl ericl merged commit 962b17f into ray-project:master Mar 2, 2019
@ericl
Copy link
Contributor

ericl commented Mar 2, 2019

Merged, thanks!

ericl added a commit to ericl/ray that referenced this pull request Mar 12, 2019
ericl added a commit that referenced this pull request Mar 12, 2019
stefanpantic added a commit to wingman-ai/ray that referenced this pull request Mar 12, 2019
ericl pushed a commit that referenced this pull request Mar 13, 2019
* Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)"

This reverts commit 3c41cb9.

* Fix a bug with log rhos for vtrace

* Reformat

* lint
@nikola-j nikola-j deleted the impala branch March 27, 2019 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants