Skip to content

Commit

Permalink
[RLlib; docs] New API stack migration guide. (ray-project#47779)
Browse files Browse the repository at this point in the history
Signed-off-by: ujjawal-khare <[email protected]>
  • Loading branch information
sven1977 authored and ujjawal-khare committed Oct 15, 2024
1 parent 025ac39 commit 33f1829
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 43 deletions.
30 changes: 11 additions & 19 deletions doc/source/rllib/new-api-stack-migration-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -213,16 +213,10 @@ This method isn't used on the old API stack because the old stack doesn't use Le

It allows you to specify:

#. the number of `Learner` workers through `.learners(num_learners=...)`.
#. the resources per learner; use `.learners(num_gpus_per_learner=1)` for GPU training
and `.learners(num_gpus_per_learner=0)` for CPU training.
#. the custom Learner class you want to use (`example on how to do this here <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__)
#. a config dict you would like to set for your custom learner:
`.learners(learner_config_dict={...})`. Note that every `Learner` has access to the
entire `AlgorithmConfig` object through `self.config`, but setting the
`learner_config_dict` is a convenient way to avoid having to create an entirely new
`AlgorithmConfig` subclass only to support a few extra settings for your custom
`Learner` class.
1) the number of `Learner` workers through `.learners(num_learners=...)`.
1) the resources per learner; use `.learners(num_gpus_per_learner=1)` for GPU training and `.learners(num_gpus_per_learner=0)` for CPU training.
1) the custom Learner class you want to use (`example on how to do this here <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__)
1) a config dict you would like to set for your custom learner: `.learners(learner_config_dict={...})`. Note that every `Learner` has access to the entire `AlgorithmConfig` object through `self.config`, but setting the `learner_config_dict` is a convenient way to avoid having to create an entirely new `AlgorithmConfig` subclass only to support a few extra settings for your custom `Learner` class.


AlgorithmConfig.env_runners()
Expand Down Expand Up @@ -386,11 +380,9 @@ and `how to write a custom LSTM-containing RL Module <https://github.com/ray-pro
There are various options for translating an existing, custom :py:class:`~ray.rllib.models.modelv2.ModelV2` from the old API stack,
to the new API stack's :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`:

#. Move your ModelV2 code to a new, custom `RLModule` class. See :ref:`RL Modules <rlmodule-guide>` for details).
#. Use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack
training run and use this checkpoint with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__.
#. Use an existing :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
object from an old API stack training run, with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_config.py>`__.
1) Move your ModelV2 code to a new, custom `RLModule` class. See :ref:`RL Modules <rlmodule-guide>` for details).
1) Use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this checkpoint with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__.
1) Use an existing :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object from an old API stack training run, with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_config.py>`__.


Custom loss functions and policies
Expand Down Expand Up @@ -431,7 +423,7 @@ The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is
The following are some examples on how to write ConnectorV2 pieces for the
different pipelines:

#. `Observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
#. `Add the most recent action and reward to the RL Module's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.
#. `Mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.
#. `Flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.
1) `Observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
1) `Add the most recent action and reward to the RL Module's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.
1) `Mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.
1) `Flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.
21 changes: 18 additions & 3 deletions rllib/algorithms/algorithm_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2184,6 +2184,7 @@ def training(
learner_config_dict: Optional[Dict[str, Any]] = NotProvided,
# Deprecated args.
num_sgd_iter=DEPRECATED_VALUE,
max_requests_in_flight_per_sampler_worker=DEPRECATED_VALUE,
) -> "AlgorithmConfig":
"""Sets the training related configuration.
Expand Down Expand Up @@ -2283,6 +2284,19 @@ def training(
error=False,
)
num_epochs = num_sgd_iter
if max_requests_in_flight_per_sampler_worker != DEPRECATED_VALUE:
deprecation_warning(
old="AlgorithmConfig.training("
"max_requests_in_flight_per_sampler_worker=...)",
new="AlgorithmConfig.env_runners("
"max_requests_in_flight_per_env_runner=...)",
error=False,
)
self.env_runners(
max_requests_in_flight_per_env_runner=(
max_requests_in_flight_per_sampler_worker
),
)

if gamma is not NotProvided:
self.gamma = gamma
Expand Down Expand Up @@ -3401,7 +3415,7 @@ def experimental(
*,
_torch_grad_scaler_class: Optional[Type] = NotProvided,
_torch_lr_scheduler_classes: Optional[
Union[List[Type], Dict[ModuleID, Type]]
Union[List[Type], Dict[ModuleID, List[Type]]]
] = NotProvided,
_tf_policy_handles_more_than_one_loss: Optional[bool] = NotProvided,
_disable_preprocessor_api: Optional[bool] = NotProvided,
Expand Down Expand Up @@ -3430,8 +3444,9 @@ def experimental(
classes or a dictionary mapping module IDs to such a list of respective
scheduler classes. Multiple scheduler classes can be applied in sequence
and will be stepped in the same sequence as defined here. Note, most
learning rate schedulers need arguments to be configured, i.e. you need
to partially initialize the schedulers in the list(s).
learning rate schedulers need arguments to be configured, that is, you
might have to partially initialize the schedulers in the list(s) using
`functools.partial`.
_tf_policy_handles_more_than_one_loss: Experimental flag.
If True, TFPolicy handles more than one loss or optimizer.
Set this to True, if you would like to return more than
Expand Down
37 changes: 16 additions & 21 deletions rllib/examples/learners/ppo_with_torch_lr_schedulers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
optimizer. In this way even more complex learning rate schedules can be assembled.
This example shows:
- how to partially initialize multiple learning rate schedulers in PyTorch.
- how to chain these schedulers together and pass the chain into RLlib's
configuration.
- how to configure multiple learning rate schedulers, as a chained pipeline, in
PyTorch using partial initialization with `functools.partial`.
How to run this script
----------------------
Expand All @@ -29,29 +29,24 @@
`--wandb-key=[your WandB API key] --wandb-project=[some project name]
--wandb-run-name=[optional: WandB run name (within the defined project)]`
Results to expect
-----------------
You should expect to observe decent learning behavior from your console output:
With `--lr-const-factor=0.1`, `--lr-const-iters=10, and `--lr-exp_decay=0.3`.
+-----------------------------+------------+----------------------+--------+
| Trial name | status | loc | iter |
| | | | |
|-----------------------------+------------+----------------------+--------+
| PPO_CartPole-v1_7fc44_00000 | TERMINATED | 192.168.1.178:225070 | 50 |
+-----------------------------+------------+----------------------+--------+
+------------------+------------------------+------------------------+
| total time (s) | num_env_steps_sample | num_episodes_lifetim |
| | d_lifetime | e |
+------------------+------------------------+------------------------+
| 59.6542 | 200000 | 9952 |
+------------------+------------------------+------------------------+
+------------------------+
| num_env_steps_traine |
| d_lifetime |
+------------------------|
| 210047 |
+------------------------+
+-----------------------------+------------+--------+------------------+
| Trial name | status | iter | total time (s) |
| | | | |
|-----------------------------+------------+--------+------------------+
| PPO_CartPole-v1_7fc44_00000 | TERMINATED | 50 | 59.6542 |
+-----------------------------+------------+--------+------------------+
+------------------------+------------------------+------------------------+
| episode_return_mean | num_episodes_lifetime | num_env_steps_traine |
| | | d_lifetime |
+------------------------+------------------------+------------------------|
| 451.2 | 9952 | 210047 |
+------------------------+------------------------+------------------------+
"""
import functools

Expand Down

0 comments on commit 33f1829

Please sign in to comment.