-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; docs] New API stack migration guide. #47779
[RLlib; docs] New API stack migration guide. #47779
Conversation
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_redo_new_api_stack_migration_guide
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LSTM. Some suggestions here and there.
.. note:: | ||
|
||
Even though the new API stack still rudimentary supports `TensorFlow <https://tensorflow.org>`__ and | ||
has been written in a framework-agnostic fashion, RLlib will soon move to `PyTorch <https://pytorch.org>`__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally :)
# Switch both the new API stack flags to True (both False by default). | ||
# This enables the use of | ||
# a) RLModule (replaces ModelV2) and Learner (replaces Policy) | ||
# b) the correct EnvRunner (single-agent vs multi-agent) and ConnectorV2 pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention here what the ConnectorV2
pipeline replaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# The following setting is equivalent to the old stack's `config.resources(num_gpus=2)`. | ||
config.learners( | ||
num_learners=2, | ||
num_gpus_per_learner=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add here a note that fractional GPUs are only possible in single-learner mode. Multi-learner setups need 1 GPU each, don't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, doesn't make sense for multi-GPU learning. Will mention this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
`entropy_coeff` setting in PPO), provide scheduling information directly in the respective setting. | ||
There is no specific, separate setting anymore for scheduling behavior. | ||
|
||
When defining a schedule, provide a list of 2-tuples, where the first item is the global timestep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mentioning here that for PyTorch _torch_lr_schedule_classes
could be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. And linked to example script.
|
||
.. testcode:: | ||
|
||
# RolloutWorkers have been re-written to EnvRunners: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better "replaced by"? This stresses out that the EnvRunner
s work way more efficient and cleaner than RolloutWorker
s and that the code is not nearly identical.
) | ||
|
||
|
||
In case you were using the `observation_filter` setting, perform the following translations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the ConnectorV2
pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't exist yet :(
to the new API stack's :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`: | ||
|
||
1) You lift your ModelV2 code and drop it into a new, custom RLModule class (see the :ref:`RLModule documentation <rlmodule-guide>` for details). | ||
1) You use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this with the `new stack RLModule convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. Sphinx will automatically enumerate these.
|
||
1) You lift your ModelV2 code and drop it into a new, custom RLModule class (see the :ref:`RLModule documentation <rlmodule-guide>` for details). | ||
1) You use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this with the `new stack RLModule convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__. | ||
1) You have an :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object from an old API stack training run and use this with the `new stack RLModule convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_config.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same :)
either stack. | ||
The goal is to reach a state where the new stack can completely replace the old one. | ||
Over the next few months, the RLlib Team will continue to document, test, benchmark, bug-fix, and | ||
further polish these new APIs as well as rollout more and more algorithms (with a focus on offline RL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DO we? We already have all Offline RL algorithms, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, ok, I thought, we'd be more conservative and make sure this all works also for GPU and multi-GPU? Just don't want to announce too much that's not 98ish% stable.
|
||
Keep in mind that due to its alpha nature, when using the new stack, you might run into issues and encounter instabilities. | ||
Keep in mind that due to its alpha nature, when using the new stack, you might run into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a statement - although true - that does not help users to decide and might even mislead them to think the new stack has more bugs than the old stack. We might want to stress out that the new stack will stay and will be worked on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This is from an older iteration. Will fix this and make it more bullish.
…_redo_new_api_stack_migration_guide
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_redo_new_api_stack_migration_guide Signed-off-by: sven1977 <[email protected]> # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first batch of suggestions - so for the quantity.
customizations inside the old stack's Policy class, you need to move these logic into the new API stack's | ||
:py:class:`~ray.rllib.core.learner.learner.Learner` class. | ||
|
||
:ref:`See here for more details on how to write a custom Learner <learner-guide>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:ref:`See here for more details on how to write a custom Learner <learner-guide>`. | |
See :ref:`Learner <learner-guide>` for details on how to write a custom Learner . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rounding out the feedback for new-api-stack-migration-guide.
Overview | ||
-------- | ||
|
||
Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground up with respect to architecture, | ||
design principles, code base, and user facing APIs. The following select algorithms and setups are available. | ||
Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground | |
Starting in Ray 2.10, you can opt-in to the alpha version of the "new API stack", a fundamental overhaul from the ground |
|
||
:ref:`See here for more details on how to write a custom Learner <learner-guide>`. | ||
|
||
Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__ | |
The following example scripts show how to write: | |
- `a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__ |
:ref:`See here for more details on how to write a custom Learner <learner-guide>`. | ||
|
||
Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__ | ||
and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__. | |
- `a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__. |
Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__ | ||
and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__. | ||
|
||
Note that the Policy class is no longer supported in the new API stack. This class used to hold a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the Policy class is no longer supported in the new API stack. This class used to hold a | |
Note that the new API stack doesn't support the Policy class. In the old stack, this class holds a |
and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__. | ||
|
||
Note that the Policy class is no longer supported in the new API stack. This class used to hold a | ||
neural network (now moved into :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neural network (now moved into :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`), | |
neural network, which is the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` in the new API stack, |
|
||
The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is work in progress and linked from here shortly. | ||
|
||
In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the | |
The following are some examples on how to write ConnectorV2 pieces for the |
In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the | ||
different pipelines: | ||
|
||
1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__. | |
1) `Observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__. |
different pipelines: | ||
|
||
1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__. | ||
1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__. | |
1) `Add the most recent action and reward to the RL Module's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__. |
|
||
1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__. | ||
1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__. | ||
1) `Example on how to do mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1) `Example on how to do mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__. | |
1) `Mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__. |
1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__. | ||
1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__. | ||
1) `Example on how to do mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__. | ||
1) `Example on how to flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1) `Example on how to flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__. | |
1) `Flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay.
@@ -7,58 +7,79 @@ | |||
RLlib's New API Stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RLlib's New API Stack | |
RLlib's new API stack |
@@ -7,58 +7,79 @@ | |||
RLlib's New API Stack | |||
===================== | |||
|
|||
.. hint:: | |||
|
|||
This section describes in detail what the new API stack is and why you should migrate to it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section describes in detail what the new API stack is and why you should migrate to it | |
This section describes the new API stack and why you should migrate to it |
.. hint:: | ||
|
||
This section describes in detail what the new API stack is and why you should migrate to it | ||
(in case you have old API stack custom code). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(in case you have old API stack custom code). | |
if you have old API stack custom code. |
The goal is to reach a state where the new stack can completely replace the old one. | ||
Over the next few months, the RLlib Team continues to document, test, benchmark, bug-fix, and | ||
further polish these new APIs as well as rollout more algorithms | ||
that you can run in the new stack (with a focus on offline RL). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that you can run in the new stack (with a focus on offline RL). | |
that you can run in the new stack, with a focus on offline RL. |
Keep in mind that due to its alpha nature, when using the new stack, you might run into issues and encounter instabilities. | ||
Also, rest assured that you are able to continue using your custom classes and setups | ||
on the old API stack for the foreseeable future (beyond Ray 3.0). | ||
Also know that you are able to continue using your custom classes and setups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also know that you are able to continue using your custom classes and setups | |
You can continue using custom classes and setups |
rllib/algorithms/algorithm_config.py
Outdated
large sample batches, where there is the risk that the object store may | ||
fill up, causing spilling of objects to disk. This can cause any | ||
asynchronous requests to become very slow, making your experiment run | ||
slow as well. You can inspect the object store during your experiment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slow as well. You can inspect the object store during your experiment | |
slowly as well. You can inspect the object store during your experiment |
rllib/algorithms/algorithm_config.py
Outdated
fill up, causing spilling of objects to disk. This can cause any | ||
asynchronous requests to become very slow, making your experiment run | ||
slow as well. You can inspect the object store during your experiment | ||
via a call to ray memory on your headnode, and by using the ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
via a call to ray memory on your headnode, and by using the ray | |
via a call to Ray memory on your head node, and by using the Ray |
rllib/algorithms/algorithm_config.py
Outdated
@@ -3317,8 +3330,9 @@ def experimental( | |||
classes or a dictionary mapping module IDs to such a list of respective | |||
scheduler classes. Multiple scheduler classes can be applied in sequence | |||
and will be stepped in the same sequence as defined here. Note, most | |||
learning rate schedulers need arguments to be configured, i.e. you need | |||
to partially initialize the schedulers in the list(s). | |||
learning rate schedulers need arguments to be configured, i.e. you might |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
learning rate schedulers need arguments to be configured, i.e. you might | |
learning rate schedulers need arguments to be configured, that is, you might |
rllib/algorithms/algorithm_config.py
Outdated
to partially initialize the schedulers in the list(s). | ||
learning rate schedulers need arguments to be configured, i.e. you might | ||
have to partially initialize the schedulers in the list(s) using | ||
`functools.partial`. | ||
_tf_policy_handles_more_than_one_loss: Experimental flag. | ||
If True, TFPolicy will handle more than one loss/optimizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If True, TFPolicy will handle more than one loss/optimizer. | |
If True, TFPolicy handles more than one loss or optimizer. |
- how to partially initialize multiple learning rate schedulers in PyTorch. | ||
- how to chain these schedulers together and pass the chain into RLlib's | ||
configuration. | ||
- how to configure multiple learning rate schedulers (as a chained pipeline) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- how to configure multiple learning rate schedulers (as a chained pipeline) in | |
- how to configure multiple learning rate schedulers, as a chained pipeline, in |
Co-authored-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_redo_new_api_stack_migration_guide
…n_guide' into docs_redo_new_api_stack_migration_guide # Conflicts: # doc/source/rllib/rllib-new-api-stack.rst
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Step-by-step new API stack migration guide.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.