[RLlib; docs] New API stack migration guide. (ray-project#47779)

Signed-off-by: ujjawal-khare <[email protected]>
ujjawal-khare-27 · Oct 15, 2024 · 33f1829 · 33f1829
1 parent 025ac39
commit 33f1829
Show file tree

Hide file tree

Showing 3 changed files with 45 additions and 43 deletions.
diff --git a/doc/source/rllib/new-api-stack-migration-guide.rst b/doc/source/rllib/new-api-stack-migration-guide.rst
@@ -213,16 +213,10 @@ This method isn't used on the old API stack because the old stack doesn't use Le
 
 It allows you to specify:
 
-#. the number of `Learner` workers through `.learners(num_learners=...)`.
-#. the resources per learner; use `.learners(num_gpus_per_learner=1)` for GPU training
-   and `.learners(num_gpus_per_learner=0)` for CPU training.
-#. the custom Learner class you want to use (`example on how to do this here <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__)
-#. a config dict you would like to set for your custom learner:
-   `.learners(learner_config_dict={...})`. Note that every `Learner` has access to the
-   entire `AlgorithmConfig` object through `self.config`, but setting the
-   `learner_config_dict` is a convenient way to avoid having to create an entirely new
-   `AlgorithmConfig` subclass only to support a few extra settings for your custom
-   `Learner` class.
+1) the number of `Learner` workers through `.learners(num_learners=...)`.
+1) the resources per learner; use `.learners(num_gpus_per_learner=1)` for GPU training and `.learners(num_gpus_per_learner=0)` for CPU training.
+1) the custom Learner class you want to use (`example on how to do this here <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__)
+1) a config dict you would like to set for your custom learner: `.learners(learner_config_dict={...})`. Note that every `Learner` has access to the entire `AlgorithmConfig` object through `self.config`, but setting the `learner_config_dict` is a convenient way to avoid having to create an entirely new `AlgorithmConfig` subclass only to support a few extra settings for your custom `Learner` class.
 
 
 AlgorithmConfig.env_runners()
@@ -386,11 +380,9 @@ and `how to write a custom LSTM-containing RL Module <https://github.com/ray-pro
 There are various options for translating an existing, custom :py:class:`~ray.rllib.models.modelv2.ModelV2` from the old API stack,
 to the new API stack's :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`:
 
-#. Move your ModelV2 code to a new, custom `RLModule` class. See :ref:`RL Modules <rlmodule-guide>` for details).
-#. Use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack
-   training run and use this checkpoint with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__.
-#. Use an existing :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
-   object from an old API stack training run, with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_config.py>`__.
+1) Move your ModelV2 code to a new, custom `RLModule` class. See :ref:`RL Modules <rlmodule-guide>` for details).
+1) Use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this checkpoint with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__.
+1) Use an existing :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object from an old API stack training run, with the `new stack RL Module convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_config.py>`__.
 
 
 Custom loss functions and policies
@@ -431,7 +423,7 @@ The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is
 The following are some examples on how to write ConnectorV2 pieces for the
 different pipelines:
 
-#. `Observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
-#. `Add the most recent action and reward to the RL Module's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.
-#. `Mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.
-#. `Flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.
+1) `Observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
+1) `Add the most recent action and reward to the RL Module's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.
+1) `Mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.
+1) `Flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.
diff --git a/rllib/algorithms/algorithm_config.py b/rllib/algorithms/algorithm_config.py
@@ -2184,6 +2184,7 @@ def training(
         learner_config_dict: Optional[Dict[str, Any]] = NotProvided,
         # Deprecated args.
         num_sgd_iter=DEPRECATED_VALUE,
+        max_requests_in_flight_per_sampler_worker=DEPRECATED_VALUE,
     ) -> "AlgorithmConfig":
         """Sets the training related configuration.
 
@@ -2283,6 +2284,19 @@ def training(
                 error=False,
             )
             num_epochs = num_sgd_iter
+        if max_requests_in_flight_per_sampler_worker != DEPRECATED_VALUE:
+            deprecation_warning(
+                old="AlgorithmConfig.training("
+                "max_requests_in_flight_per_sampler_worker=...)",
+                new="AlgorithmConfig.env_runners("
+                "max_requests_in_flight_per_env_runner=...)",
+                error=False,
+            )
+            self.env_runners(
+                max_requests_in_flight_per_env_runner=(
+                    max_requests_in_flight_per_sampler_worker
+                ),
+            )
 
         if gamma is not NotProvided:
             self.gamma = gamma
@@ -3401,7 +3415,7 @@ def experimental(
         *,
         _torch_grad_scaler_class: Optional[Type] = NotProvided,
         _torch_lr_scheduler_classes: Optional[
-            Union[List[Type], Dict[ModuleID, Type]]
+            Union[List[Type], Dict[ModuleID, List[Type]]]
         ] = NotProvided,
         _tf_policy_handles_more_than_one_loss: Optional[bool] = NotProvided,
         _disable_preprocessor_api: Optional[bool] = NotProvided,
@@ -3430,8 +3444,9 @@ def experimental(
                 classes or a dictionary mapping module IDs to such a list of respective
                 scheduler classes. Multiple scheduler classes can be applied in sequence
                 and will be stepped in the same sequence as defined here. Note, most
-                learning rate schedulers need arguments to be configured, i.e. you need
-                to partially initialize the schedulers in the list(s).
+                learning rate schedulers need arguments to be configured, that is, you
+                might have to partially initialize the schedulers in the list(s) using
+                `functools.partial`.
             _tf_policy_handles_more_than_one_loss: Experimental flag.
                 If True, TFPolicy handles more than one loss or optimizer.
                 Set this to True, if you would like to return more than

diff --git a/rllib/examples/learners/ppo_with_torch_lr_schedulers.py b/rllib/examples/learners/ppo_with_torch_lr_schedulers.py
@@ -5,9 +5,9 @@
 optimizer. In this way even more complex learning rate schedules can be assembled.
 
 This example shows:
-    - how to partially initialize multiple learning rate schedulers in PyTorch.
-    - how to chain these schedulers together and pass the chain into RLlib's
-        configuration.
+    - how to configure multiple learning rate schedulers, as a chained pipeline, in
+    PyTorch using partial initialization with `functools.partial`.
+
 
 How to run this script
 ----------------------
@@ -29,29 +29,24 @@
 `--wandb-key=[your WandB API key] --wandb-project=[some project name]
 --wandb-run-name=[optional: WandB run name (within the defined project)]`
 
+
 Results to expect
 -----------------
 You should expect to observe decent learning behavior from your console output:
 
 With `--lr-const-factor=0.1`, `--lr-const-iters=10, and `--lr-exp_decay=0.3`.
-+-----------------------------+------------+----------------------+--------+
-| Trial name                  | status     | loc                  |   iter |
-|                             |            |                      |        |
-|-----------------------------+------------+----------------------+--------+
-| PPO_CartPole-v1_7fc44_00000 | TERMINATED | 192.168.1.178:225070 |     50 |
-+-----------------------------+------------+----------------------+--------+
-+------------------+------------------------+------------------------+
-|   total time (s) |   num_env_steps_sample |   num_episodes_lifetim |
-|                  |             d_lifetime |                      e |
-+------------------+------------------------+------------------------+
-|          59.6542 |                 200000 |                   9952 |
-+------------------+------------------------+------------------------+
-+------------------------+
-|   num_env_steps_traine |
-|             d_lifetime |
-+------------------------|
-|                 210047 |
-+------------------------+
++-----------------------------+------------+--------+------------------+
+| Trial name                  | status     |   iter |   total time (s) |
+|                             |            |        |                  |
+|-----------------------------+------------+--------+------------------+
+| PPO_CartPole-v1_7fc44_00000 | TERMINATED |     50 |          59.6542 |
++-----------------------------+------------+--------+------------------+
++------------------------+------------------------+------------------------+
+|    episode_return_mean |  num_episodes_lifetime |   num_env_steps_traine |
+|                        |                        |             d_lifetime |
++------------------------+------------------------+------------------------|
+|                  451.2 |                   9952 |                 210047 |
++------------------------+------------------------+------------------------+
 """
 import functools