New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[RLlib] Created action_dist_v2 for RLModule examples, RLModule PR 2/N #29600

Merged

gjoliver merged 10 commits into ray-project:master from kouroshHakha:action-dist-v2

Oct 24, 2022

Contributor

kouroshHakha commented Oct 24, 2022 •

edited

Loading

Signed-off-by: Kourosh Hakhamaneshi [email protected]

Why are these changes needed?

submitting this pr in pieces:
These action distributions have simpler interface and are more explicit. Previously users would have to pass in this magical input and modelv2 instance (only for auto-regressive distributions), but now the interface is more explicit and more familiar since it looks like pytorch distributions. The auto-regressive distribution can still be created by sub-classing this base-class.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(


          created action_dist_v2 for RLModule examples

da8f916

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

kouroshHakha requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and krfricke as code owners

October 24, 2022 06:56

kouroshHakha assigned sven1977 and gjoliver

kouroshHakha changed the title ~~[RLlib] Created action_dist_v2 for RLModule examples~~ [RLlib] Created action_dist_v2 for RLModule examples, RLModule PR 2/N


          Merge branch 'master' into action-dist-v2

d366847

sven1977 reviewed

View reviewed changes

rllib/models/action_dist_v2.py Outdated



		@ExperimentalAPI
		class ActionDistributionV2(abc.ABC):

Contributor

sven1977 Oct 24, 2022

We should really take this opportunity and get rid of the restriction to actions. Can we just call this Distribution?
This would then cover models that predict next states, rewards, actions, etc.. as well as serve inside variational autoencoders.

Contributor Author

kouroshHakha Oct 24, 2022

Yeah, I agree. fixed.

sven1977 reviewed

View reviewed changes

rllib/models/action_dist_v2.py Outdated

+                      options.
+                      Args:
+                          action_space: The action space this distribution will be used for,

Contributor

sven1977 Oct 24, 2022

action_space -> space

Contributor Author

kouroshHakha Oct 24, 2022

fixed.

sven1977 reviewed

View reviewed changes

rllib/models/action_dist_v2.py Outdated

+                  def required_model_output_shape(
+                      action_space: gym.Space, model_config: ModelConfigDict
+                  ) -> Tuple[int, ...]:
+                      """Returns the required shape of an input parameter tensor for a

Contributor

sven1977 Oct 24, 2022

Can we give 2 examples here of how this method will be used (for Categorical and DiagGaussian)?

Does the shape include the batch dim (time dim, etc..)?
What are typical cases, where the model config would be used and how would it be used?

Contributor Author

kouroshHakha Oct 24, 2022

I am not sure if this will actually get used tbh. I have it here just as a reminder to the next developer that this was part of the old distribution that we may or may not keep depending on how the catalog gets written. added a comment to reflect this.

sven1977 reviewed

View reviewed changes

rllib/models/action_dist_v2.py Outdated

+                      return_logp: bool = False,
+                      **kwargs
+                  ) -> Union[TensorType, Tuple[TensorType, TensorType]]:
+                      """Draw a re-parameterized sample from the action distribution.

Contributor

sven1977 Oct 24, 2022

Can we quickly explain what the difference is to sample? Basically that rsample is backprop'able and sample is not (iiuc)?

Contributor Author

kouroshHakha Oct 24, 2022

yep added.

sven1977 reviewed

View reviewed changes

rllib/models/action_dist_v2.py Outdated

+                      self,
+                      *,
+                      sample_shape: Tuple[int, ...] = None,
+                      return_logp: bool = False,

Contributor

sven1977 Oct 24, 2022 •

edited

Loading

Great choice to add this option to the signature!

sven1977 reviewed

View reviewed changes

rllib/models/action_dist_v2.py Outdated

+                  """The policy action distribution of an agent.
+                  Args:
+                      inputs: input vector to define the distribution over.

Contributor

sven1977 Oct 24, 2022

Is this always a vector? Or could this also be a dict of tensors (multi-distribution) or a >1D tensor (multi-variate diag gaussian)?

Contributor Author

kouroshHakha Oct 24, 2022

Oh this is a residue of the old stuff. Doesn't mean anything anymore. Removed. :)

sven1977 reviewed

View reviewed changes

rllib/models/action_dist_v2.py Outdated

+                      """Draw a sample from the action distribution.
+                      Args:
+                          sample_shape: The shape of the sample to draw.

Contributor

sven1977 Oct 24, 2022

Can we be more specific here? Doesn't the input (currently self.inputs) already determine the batch/time dimensions and the rest is "fixed"? If this is not the case, can we add examples to this docstring?

Contributor Author

kouroshHakha Oct 24, 2022

This should be addressed in the docstrings of each individual distribution.

sven1977 reviewed

View reviewed changes

Contributor

sven1977 left a comment

Awesome PR @kouroshHakha ! Just a few questions and nits on the docstrings and one renaming request.

Contributor

sven1977 commented Oct 24, 2022

Exciting to see all these things being done-over! :)
@kouroshHakha

sven1977 reviewed

View reviewed changes

rllib/models/torch/torch_action_dist_v2.py Outdated

+                  def rsample(
+                      self, *, sample_shape=torch.Size(), return_logp: bool = False
+                  ) -> Union[TensorType, Tuple[TensorType, TensorType]]:
+                      sample = self.dist.rsample(sample_shape)

Contributor

sven1977 Oct 24, 2022

Nit: rename local var sample to rsample for clarity?

sven1977 reviewed

View reviewed changes

rllib/models/torch/torch_action_dist_v2.py Outdated

+                      self.dist = self._get_distribution(*args, **kwargs)
+                  @abc.abstractmethod
+                  def _get_distribution(self, *args, **kwargs) -> torch.distributions.Distribution:

Contributor

sven1977 Oct 24, 2022

Hmm, the more I think about this, we should rename this into _get_torch_distribution for clarity (vs _get_tf_distribution).
Or even _get_underlying_torch_distribution. B/c self is already a TorchDistribution, albeit an RLlib one :)

Contributor Author

kouroshHakha Oct 24, 2022

I agree. fixed.

sven1977 reviewed

View reviewed changes

rllib/models/torch/torch_action_dist_v2.py Outdated

+                      logits: torch.Tensor = None,
+                      temperature: float = 1.0,
+                  ) -> None:
+                      super().__init__(probs=probs, logits=logits, temperature=temperature)

Contributor

sven1977 Oct 24, 2022

docstring?

Contributor Author

kouroshHakha Oct 24, 2022

added

sven1977 reviewed

View reviewed changes

rllib/models/torch/torch_action_dist_v2.py Outdated

+                      loc: torch.Tensor,
+                      scale: Optional[torch.Tensor] = None,
+                  ):
+                      super().__init__(loc=loc, scale=scale)

Contributor

sven1977 Oct 24, 2022

docstring?

Contributor Author

kouroshHakha Oct 24, 2022

added

sven1977 reviewed

View reviewed changes

rllib/models/torch/torch_action_dist_v2.py Outdated

+                  """
+                  def __init__(self, loc: torch.Tensor) -> None:
+                      super().__init__()

Contributor

sven1977 Oct 24, 2022

docstring?

Contributor Author

kouroshHakha Oct 24, 2022

added

sven1977 reviewed

View reviewed changes

rllib/models/torch/torch_action_dist_v2.py Outdated

+              @DeveloperAPI
+              class TorchDeterministic(ActionDistributionV2):
+                  """Action distribution that returns the input values directly.

Contributor

sven1977 Oct 24, 2022

"Action distribution" -> "Distribution"

kouroshHakha added 3 commits

October 24, 2022 08:33


          torch.pi to math.pi

f96534c

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>


          added missing docstrings and some nit renames

4bcad07

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>


          lint

486f3d2

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

kouroshHakha added 3 commits

October 24, 2022 10:14


          updated test

0a50805

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>


          doscsrig

185257e

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>


          docstring

7d12b65

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

Contributor Author

kouroshHakha commented Oct 24, 2022

@sven1977 please re-review.

sven1977 approved these changes

View reviewed changes

Contributor

sven1977 left a comment

LGTM!

Contributor

sven1977 commented Oct 24, 2022

test_distributions_v2 failing
cc: @kouroshHakha .

kouroshHakha added 2 commits

October 24, 2022 11:45


          updated pi

8e86cb2

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>


          lint

3e2933e

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

gjoliver approved these changes

View reviewed changes

Member

gjoliver left a comment

I like this new base class!
just a little worried that we are not adding TF version of the implementation yet.
should be pretty easy actually?
but this is good regardless.

Contributor Author

kouroshHakha commented Oct 24, 2022

I like this new base class! just a little worried that we are not adding TF version of the implementation yet. should be pretty easy actually? but this is good regardless.

The goal is to get to the torch ppo POC first on RLModules. Then come back and modify for TF.

Contributor Author

kouroshHakha commented Oct 24, 2022

@gjoliver This is ready for merge. The failing tests are not relevant. Thanks.

gjoliver merged commit 3562cb4 into ray-project:master

kouroshHakha mentioned this pull request

[RLlib] Added dtype to torch deterministic action #29648

Merged

7 tasks

WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request


          [RLlib] Created action_dist_v2 for RLModule examples, RLModule PR 2/N (…

d7fa272

…ray-project#29600)

* created action_dist_v2 for RLModule examples

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

gjoliver gjoliver approved these changes

sven1977 sven1977 approved these changes

avnishn Awaiting requested review from avnishn

ArturNiederfahrenhorst Awaiting requested review from ArturNiederfahrenhorst

smorad Awaiting requested review from smorad

maxpumperla Awaiting requested review from maxpumperla

krfricke Awaiting requested review from krfricke

Labels

None yet