-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs][rllib] Documentation for connectors. #27528
Conversation
This setup is useful for certain multi-agent use cases where individual observations may need to be | ||
modified based on data from other agents. | ||
This can also be useful if users need to construct meta-observation, e.g., build a graph as input | ||
to the policy from individual agent observations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably here you should add a paragraph explaining the ActionConnectorDataType
data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good idea, let me add a section about the common data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some minor changes here and there. The high-level feedback: This is a good starting doc for internal dev and stuff, but not very useful for external users. After reading this doc it is still not clear how I would use connectors, or modify the connector pipeline with a custom one? Also the input / output data structures are not documented which would require the user to dig into the code-base to be able to use it.
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Added TODOs, and a section about common data types used by these connectors. |
@maxpumperla and @richardliaw, there are more contents I want to add, but can you guys help edit the initial landing page of the connectors feature? |
ah ok |
Signed-off-by: Richard Liaw <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this as a first doc on connectors. The only thing I'd really like to see is 1-2 very concrete usage examples. Signatures and types are important, too, but it's not entirely clear how I'd use this in practice.
E.g. where exactly do I set "enable_connectors" to "True" etc., how does my algorithm spec change if I use them. The example(s) can be almost trivial, as long as it's clear how to leverage the API. Before and after comparisons (e.g. diffs) would be a plus.
definitely. I plan to add a somewhat e2e example (probably in notebook format) to demonstrate the usage and important benefits (see the TODO section). |
doc/source/rllib/rllib-connector.rst
Outdated
================== | ||
|
||
Connector are components that handle transformations on inputs and outputs of a given RL policy, with the goal of improving | ||
the durabilty and maintainability of RLlib's policy checkpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
durability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
|
||
By consolidating these transformations under the framework of connectors, users of RLlib will be able to: | ||
|
||
- Restore and deploy individual RLlib policies without having to restore training related logics of RLlib Algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
training-related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
- Restore and deploy individual RLlib policies without having to restore training related logics of RLlib Algorithms. | ||
- Ensure policies are more durable than the algorithms they get trained with. | ||
- Allow policies to be adapted to work with different versions of an environment. | ||
- Run inference with RLlib polcies without worrying about the exact trajectory view requriements or state inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
policies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
- Allow policies to be adapted to work with different versions of an environment. | ||
- Run inference with RLlib polcies without worrying about the exact trajectory view requriements or state inputs. | ||
|
||
Connectors can be enabled by setting ``enable_connectors`` parameter to ``True``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... by setting the enable_connectors
parameter to True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
~~~~~~~~~~~~~~ | ||
|
||
``AgentConnectors`` handle the job of transforming environment observation data into a format that is understood by | ||
the policy (e.g., flattening complex nested observations into a flat tensor). The high level APIs are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high-level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
------------------- | ||
|
||
Lambda Connectors helps turn simple transformation functions into agent or action | ||
connectors without having users worry about the high level list or non-list APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high-level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
Advanced Connectors | ||
------------------- | ||
|
||
Lambda Connectors helps turn simple transformation functions into agent or action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the plural s
here. Shouldn't it be "Lambda Connectors help ..." or "The Lambda Connector helps ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
|
||
Lambda Connectors helps turn simple transformation functions into agent or action | ||
connectors without having users worry about the high level list or non-list APIs. | ||
Lambda Connectors has separate agent and action versions, for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, not sure about singular/plural s
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
lambda actions, states, fetches: 2 * actions, states, fetches | ||
) | ||
|
||
Mutiple connectors can be composed into a ``ConnectorPipeline``, which handles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
|
||
If connectors are enabled, RLlib will try to save policy checkpoints in properly serialized formats instead of | ||
relying on python pickling. Eventually, the goal is to save policy checkpoints in serialized JSON files | ||
to ensure maximum compatiiblity between RLlib and python versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
When enabled, the configurations of agent and action connectors will get serialized and saved with checkpointed | ||
policy states. | ||
These connectors, together with the specific transformations they represent, | ||
can be easily recovered (by RLlib provided utils) to simplify deployment and inference use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RLlib-provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
Adapting a Policy for Different Environments | ||
-------------------------------------------- | ||
|
||
It not uncommon for user environments to go through active development iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
It not uncommon for user environments to go through active development iterations. | ||
Policies trained with an older version of an environment may be rendered useless for updated environments. | ||
While env wrapper helps with this problem in many cases, connectors allow policies trained with | ||
different environments to work together at a same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... together at the same time. would be correct I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
:align: center | ||
|
||
We have two classes of connectors. The first is an ``AgentConnector``, which is used to transform observed data from environments to the policy. | ||
The second is an ``ActionConnector``, which is used to transform the action data from the policy to actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like "... transform the outputs of the policy into actions." better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-connector.rst
Outdated
While env wrapper helps with this problem in many cases, connectors allow policies trained with | ||
different environments to work together at a same time. | ||
|
||
Here is an example demonstrating adaptation of a policy trained for the standard Cartopole environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cartpole or CartPole-v0 or something! One o
too much here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff. I think this already provides users with a good sense of what they are facing. A notebook to enable them to play with connectors and pipelines would be awesome. Observing input/output per transform would be great here.
Signed-off-by: Jun Gong <[email protected]>
Why are these changes needed?
Add documentation for RLlib connectors.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.