Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs][rllib] Documentation for connectors. #27528

Merged
merged 9 commits into from
Aug 19, 2022

Conversation

gjoliver
Copy link
Member

@gjoliver gjoliver commented Aug 4, 2022

Why are these changes needed?

Add documentation for RLlib connectors.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

This setup is useful for certain multi-agent use cases where individual observations may need to be
modified based on data from other agents.
This can also be useful if users need to construct meta-observation, e.g., build a graph as input
to the policy from individual agent observations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here you should add a paragraph explaining the ActionConnectorDataType data type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good idea, let me add a section about the common data types.

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some minor changes here and there. The high-level feedback: This is a good starting doc for internal dev and stuff, but not very useful for external users. After reading this doc it is still not clear how I would use connectors, or modify the connector pipeline with a custom one? Also the input / output data structures are not documented which would require the user to dig into the code-base to be able to use it.

kouroshHakha and others added 3 commits August 5, 2022 09:30
@gjoliver gjoliver requested a review from a team as a code owner August 6, 2022 23:59
@gjoliver
Copy link
Member Author

gjoliver commented Aug 6, 2022

Added TODOs, and a section about common data types used by these connectors.
PTAL.

@stephanie-wang stephanie-wang changed the title Documentation for connectors. [docs][rllib] Documentation for connectors. Aug 8, 2022
@gjoliver
Copy link
Member Author

@maxpumperla and @richardliaw, there are more contents I want to add, but can you guys help edit the initial landing page of the connectors feature?
Thanks.

@richardliaw
Copy link
Contributor

ah ok

Copy link
Contributor

@maxpumperla maxpumperla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this as a first doc on connectors. The only thing I'd really like to see is 1-2 very concrete usage examples. Signatures and types are important, too, but it's not entirely clear how I'd use this in practice.

E.g. where exactly do I set "enable_connectors" to "True" etc., how does my algorithm spec change if I use them. The example(s) can be almost trivial, as long as it's clear how to leverage the API. Before and after comparisons (e.g. diffs) would be a plus.

@gjoliver
Copy link
Member Author

I really like this as a first doc on connectors. The only thing I'd really like to see is 1-2 very concrete usage examples. Signatures and types are important, too, but it's not entirely clear how I'd use this in practice.

E.g. where exactly do I set "enable_connectors" to "True" etc., how does my algorithm spec change if I use them. The example(s) can be almost trivial, as long as it's clear how to leverage the API. Before and after comparisons (e.g. diffs) would be a plus.

definitely. I plan to add a somewhat e2e example (probably in notebook format) to demonstrate the usage and important benefits (see the TODO section).
just trying to get the first version in, so I can share this with some of the early testers.
thanks for the thoughtful edits.

==================

Connector are components that handle transformations on inputs and outputs of a given RL policy, with the goal of improving
the durabilty and maintainability of RLlib's policy checkpoints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

durability

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


By consolidating these transformations under the framework of connectors, users of RLlib will be able to:

- Restore and deploy individual RLlib policies without having to restore training related logics of RLlib Algorithms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training-related

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Restore and deploy individual RLlib policies without having to restore training related logics of RLlib Algorithms.
- Ensure policies are more durable than the algorithms they get trained with.
- Allow policies to be adapted to work with different versions of an environment.
- Run inference with RLlib polcies without worrying about the exact trajectory view requriements or state inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Allow policies to be adapted to work with different versions of an environment.
- Run inference with RLlib polcies without worrying about the exact trajectory view requriements or state inputs.

Connectors can be enabled by setting ``enable_connectors`` parameter to ``True``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... by setting the enable_connectors parameter to True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

~~~~~~~~~~~~~~

``AgentConnectors`` handle the job of transforming environment observation data into a format that is understood by
the policy (e.g., flattening complex nested observations into a flat tensor). The high level APIs are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

-------------------

Lambda Connectors helps turn simple transformation functions into agent or action
connectors without having users worry about the high level list or non-list APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Advanced Connectors
-------------------

Lambda Connectors helps turn simple transformation functions into agent or action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the plural s here. Shouldn't it be "Lambda Connectors help ..." or "The Lambda Connector helps ..."?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Lambda Connectors helps turn simple transformation functions into agent or action
connectors without having users worry about the high level list or non-list APIs.
Lambda Connectors has separate agent and action versions, for example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, not sure about singular/plural s here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

lambda actions, states, fetches: 2 * actions, states, fetches
)

Mutiple connectors can be composed into a ``ConnectorPipeline``, which handles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


If connectors are enabled, RLlib will try to save policy checkpoints in properly serialized formats instead of
relying on python pickling. Eventually, the goal is to save policy checkpoints in serialized JSON files
to ensure maximum compatiiblity between RLlib and python versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

When enabled, the configurations of agent and action connectors will get serialized and saved with checkpointed
policy states.
These connectors, together with the specific transformations they represent,
can be easily recovered (by RLlib provided utils) to simplify deployment and inference use cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RLlib-provided

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Adapting a Policy for Different Environments
--------------------------------------------

It not uncommon for user environments to go through active development iterations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

It not uncommon for user environments to go through active development iterations.
Policies trained with an older version of an environment may be rendered useless for updated environments.
While env wrapper helps with this problem in many cases, connectors allow policies trained with
different environments to work together at a same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... together at the same time. would be correct I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:align: center

We have two classes of connectors. The first is an ``AgentConnector``, which is used to transform observed data from environments to the policy.
The second is an ``ActionConnector``, which is used to transform the action data from the policy to actions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like "... transform the outputs of the policy into actions." better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

While env wrapper helps with this problem in many cases, connectors allow policies trained with
different environments to work together at a same time.

Here is an example demonstrating adaptation of a policy trained for the standard Cartopole environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cartpole or CartPole-v0 or something! One o too much here :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. I think this already provides users with a good sense of what they are facing. A notebook to enable them to play with connectors and pipelines would be awesome. Observing input/output per transform would be great here.

Signed-off-by: Jun Gong <[email protected]>
@richardliaw richardliaw merged commit 62b91cb into ray-project:master Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants