[RLlib; docs] New API stack migration guide. #47779

sven1977 · 2024-09-21T21:32:24Z

Step-by-step new API stack migration guide.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

…_redo_new_api_stack_migration_guide

Signed-off-by: sven1977 <[email protected]>

simonsays1980

LSTM. Some suggestions here and there.

simonsays1980 · 2024-09-23T13:19:33Z

doc/source/rllib/new-api-stack-migration-guide.rst

+.. note::
+
+    Even though the new API stack still rudimentary supports `TensorFlow <https://tensorflow.org>`__ and
+    has been written in a framework-agnostic fashion, RLlib will soon move to `PyTorch <https://pytorch.org>`__


simonsays1980 · 2024-09-23T13:20:54Z

doc/source/rllib/new-api-stack-migration-guide.rst

+        # Switch both the new API stack flags to True (both False by default).
+        # This enables the use of
+        # a) RLModule (replaces ModelV2) and Learner (replaces Policy)
+        # b) the correct EnvRunner (single-agent vs multi-agent) and ConnectorV2 pipelines.


Maybe mention here what the ConnectorV2 pipeline replaces?

simonsays1980 · 2024-09-23T13:23:00Z

doc/source/rllib/new-api-stack-migration-guide.rst

+    # The following setting is equivalent to the old stack's `config.resources(num_gpus=2)`.
+    config.learners(
+        num_learners=2,
+        num_gpus_per_learner=1,


Maybe add here a note that fractional GPUs are only possible in single-learner mode. Multi-learner setups need 1 GPU each, don't they?

That's correct, doesn't make sense for multi-GPU learning. Will mention this!

simonsays1980 · 2024-09-23T14:23:20Z

doc/source/rllib/new-api-stack-migration-guide.rst

+`entropy_coeff` setting in PPO), provide scheduling information directly in the respective setting.
+There is no specific, separate setting anymore for scheduling behavior.
+
+When defining a schedule, provide a list of 2-tuples, where the first item is the global timestep


Maybe mentioning here that for PyTorch _torch_lr_schedule_classes could be used?

done. And linked to example script.

simonsays1980 · 2024-09-23T14:24:24Z

doc/source/rllib/new-api-stack-migration-guide.rst

+
+.. testcode::
+
+    # RolloutWorkers have been re-written to EnvRunners:


Better "replaced by"? This stresses out that the EnvRunners work way more efficient and cleaner than RolloutWorkers and that the code is not nearly identical.

simonsays1980 · 2024-09-23T14:24:59Z

doc/source/rllib/new-api-stack-migration-guide.rst

+    )
+
+
+In case you were using the `observation_filter` setting, perform the following translations:


Link to the ConnectorV2 pages.

Doesn't exist yet :(

simonsays1980 · 2024-09-23T14:28:25Z

doc/source/rllib/new-api-stack-migration-guide.rst

+to the new API stack's :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`:
+
+1) You lift your ModelV2 code and drop it into a new, custom RLModule class (see the :ref:`RLModule documentation <rlmodule-guide>` for details).
+1) You use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this with the `new stack RLModule convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__.


This is correct. Sphinx will automatically enumerate these.

simonsays1980 · 2024-09-23T14:28:35Z

doc/source/rllib/new-api-stack-migration-guide.rst

+
+1) You lift your ModelV2 code and drop it into a new, custom RLModule class (see the :ref:`RLModule documentation <rlmodule-guide>` for details).
+1) You use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this with the `new stack RLModule convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_policy_checkpoint.py>`__.
+1) You have an :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object from an old API stack training run and use this with the `new stack RLModule convenience wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/migrate_modelv2_to_new_api_stack_by_config.py>`__.


simonsays1980 · 2024-09-23T14:30:33Z

doc/source/rllib/rllib-new-api-stack.rst

-either stack.
-The goal is to reach a state where the new stack can completely replace the old one.
+Over the next few months, the RLlib Team will continue to document, test, benchmark, bug-fix, and
+further polish these new APIs as well as rollout more and more algorithms (with a focus on offline RL)


DO we? We already have all Offline RL algorithms, don't we?

Sure, ok, I thought, we'd be more conservative and make sure this all works also for GPU and multi-GPU? Just don't want to announce too much that's not 98ish% stable.

simonsays1980 · 2024-09-23T14:31:55Z

doc/source/rllib/rllib-new-api-stack.rst


-Keep in mind that due to its alpha nature, when using the new stack, you might run into issues and encounter instabilities.
+Keep in mind that due to its alpha nature, when using the new stack, you might run into


I think this is a statement - although true - that does not help users to decide and might even mislead them to think the new stack has more bugs than the old stack. We might want to stress out that the new stack will stay and will be worked on.

You are right. This is from an older iteration. Will fix this and make it more bullish.

…_redo_new_api_stack_migration_guide

Signed-off-by: sven1977 <[email protected]>

…_redo_new_api_stack_migration_guide Signed-off-by: sven1977 <[email protected]> # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

angelinalg

first batch of suggestions - so for the quantity.

doc/source/rllib/new-api-stack-migration-guide.rst

angelinalg · 2024-09-25T18:59:38Z

doc/source/rllib/new-api-stack-migration-guide.rst

+customizations inside the old stack's Policy class, you need to move these logic into the new API stack's
+:py:class:`~ray.rllib.core.learner.learner.Learner` class.
+
+:ref:`See here for more details on how to write a custom Learner <learner-guide>`.


Suggested change

:ref:`See here for more details on how to write a custom Learner <learner-guide>`.

See :ref:`Learner <learner-guide>` for details on how to write a custom Learner .

angelinalg

Rounding out the feedback for new-api-stack-migration-guide.

angelinalg · 2024-09-25T20:57:45Z

doc/source/rllib/rllib-new-api-stack.rst

 Overview
 --------

-Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground up with respect to architecture,
-design principles, code base, and user facing APIs. The following select algorithms and setups are available.
+Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground


Suggested change

Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground

Starting in Ray 2.10, you can opt-in to the alpha version of the "new API stack", a fundamental overhaul from the ground

angelinalg · 2024-09-25T21:02:24Z

doc/source/rllib/new-api-stack-migration-guide.rst

+
+:ref:`See here for more details on how to write a custom Learner <learner-guide>`.
+
+Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__


Suggested change

Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__

The following example scripts show how to write:

- `a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__

angelinalg · 2024-09-25T21:02:42Z

doc/source/rllib/new-api-stack-migration-guide.rst

+:ref:`See here for more details on how to write a custom Learner <learner-guide>`.
+
+Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__
+and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__.


Suggested change

and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__.

- `a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__.

angelinalg · 2024-09-25T21:04:13Z

doc/source/rllib/new-api-stack-migration-guide.rst

+Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__
+and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__.
+
+Note that the Policy class is no longer supported in the new API stack. This class used to hold a


Suggested change

Note that the Policy class is no longer supported in the new API stack. This class used to hold a

Note that the new API stack doesn't support the Policy class. In the old stack, this class holds a

angelinalg · 2024-09-25T21:04:51Z

doc/source/rllib/new-api-stack-migration-guide.rst

+and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__.
+
+Note that the Policy class is no longer supported in the new API stack. This class used to hold a
+neural network (now moved into :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`),


Suggested change

neural network (now moved into :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`),

neural network, which is the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` in the new API stack,

angelinalg · 2024-09-25T21:11:39Z

doc/source/rllib/new-api-stack-migration-guide.rst

+
+The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is work in progress and linked from here shortly.
+
+In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the


Suggested change

In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the

The following are some examples on how to write ConnectorV2 pieces for the

angelinalg · 2024-09-25T21:12:01Z

doc/source/rllib/new-api-stack-migration-guide.rst

+In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the
+different pipelines:
+
+1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.


Suggested change

1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.

1) `Observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.

angelinalg · 2024-09-25T21:12:18Z

doc/source/rllib/new-api-stack-migration-guide.rst

+different pipelines:
+
+1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
+1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.


Suggested change

1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.

1) `Add the most recent action and reward to the RL Module's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.

angelinalg · 2024-09-25T21:12:33Z

doc/source/rllib/new-api-stack-migration-guide.rst

+
+1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
+1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.
+1) `Example on how to do mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.


Suggested change

1) `Example on how to do mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.

1) `Mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.

angelinalg · 2024-09-25T21:12:47Z

doc/source/rllib/new-api-stack-migration-guide.rst

+1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
+1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.
+1) `Example on how to do mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.
+1) `Example on how to flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.


Suggested change

1) `Example on how to flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.

1) `Flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.

angelinalg

Sorry for the delay.

angelinalg · 2024-09-25T21:13:51Z

doc/source/rllib/rllib-new-api-stack.rst

@@ -7,58 +7,79 @@
 RLlib's New API Stack


Suggested change

RLlib's New API Stack

RLlib's new API stack

angelinalg · 2024-09-25T21:14:25Z

doc/source/rllib/rllib-new-api-stack.rst

@@ -7,58 +7,79 @@
 RLlib's New API Stack
 =====================

+.. hint::
+
+    This section describes in detail what the new API stack is and why you should migrate to it


Suggested change

This section describes in detail what the new API stack is and why you should migrate to it

This section describes the new API stack and why you should migrate to it

angelinalg · 2024-09-25T21:14:40Z

doc/source/rllib/rllib-new-api-stack.rst

+.. hint::
+
+    This section describes in detail what the new API stack is and why you should migrate to it
+    (in case you have old API stack custom code).


Suggested change

(in case you have old API stack custom code).

if you have old API stack custom code.

angelinalg · 2024-09-25T21:15:13Z

doc/source/rllib/rllib-new-api-stack.rst

-The goal is to reach a state where the new stack can completely replace the old one.
+Over the next few months, the RLlib Team continues to document, test, benchmark, bug-fix, and
+further polish these new APIs as well as rollout more algorithms
+that you can run in the new stack (with a focus on offline RL).


Suggested change

that you can run in the new stack (with a focus on offline RL).

that you can run in the new stack, with a focus on offline RL.

angelinalg · 2024-09-25T21:15:36Z

doc/source/rllib/rllib-new-api-stack.rst

-Keep in mind that due to its alpha nature, when using the new stack, you might run into issues and encounter instabilities.
-Also, rest assured that you are able to continue using your custom classes and setups
-on the old API stack for the foreseeable future (beyond Ray 3.0).
+Also know that you are able to continue using your custom classes and setups


Suggested change

Also know that you are able to continue using your custom classes and setups

You can continue using custom classes and setups

angelinalg · 2024-09-25T21:34:12Z

rllib/algorithms/algorithm_config.py

+                large sample batches, where there is the risk that the object store may
+                fill up, causing spilling of objects to disk. This can cause any
+                asynchronous requests to become very slow, making your experiment run
+                slow as well. You can inspect the object store during your experiment


Suggested change

slow as well. You can inspect the object store during your experiment

slowly as well. You can inspect the object store during your experiment

angelinalg · 2024-09-25T21:34:24Z

rllib/algorithms/algorithm_config.py

+                fill up, causing spilling of objects to disk. This can cause any
+                asynchronous requests to become very slow, making your experiment run
+                slow as well. You can inspect the object store during your experiment
+                via a call to ray memory on your headnode, and by using the ray


Suggested change

via a call to ray memory on your headnode, and by using the ray

via a call to Ray memory on your head node, and by using the Ray

angelinalg · 2024-09-25T21:35:09Z

rllib/algorithms/algorithm_config.py

@@ -3317,8 +3330,9 @@ def experimental(
                classes or a dictionary mapping module IDs to such a list of respective
                scheduler classes. Multiple scheduler classes can be applied in sequence
                and will be stepped in the same sequence as defined here. Note, most
-                learning rate schedulers need arguments to be configured, i.e. you need
-                to partially initialize the schedulers in the list(s).
+                learning rate schedulers need arguments to be configured, i.e. you might


Suggested change

learning rate schedulers need arguments to be configured, i.e. you might

learning rate schedulers need arguments to be configured, that is, you might

angelinalg · 2024-09-25T21:35:24Z

rllib/algorithms/algorithm_config.py

-                to partially initialize the schedulers in the list(s).
+                learning rate schedulers need arguments to be configured, i.e. you might
+                have to partially initialize the schedulers in the list(s) using
+                `functools.partial`.
            _tf_policy_handles_more_than_one_loss: Experimental flag.
                If True, TFPolicy will handle more than one loss/optimizer.


Suggested change

If True, TFPolicy will handle more than one loss/optimizer.

If True, TFPolicy handles more than one loss or optimizer.

angelinalg · 2024-09-25T21:35:43Z

rllib/examples/learners/ppo_with_torch_lr_schedulers.py

-    - how to partially initialize multiple learning rate schedulers in PyTorch.
-    - how to chain these schedulers together and pass the chain into RLlib's
-        configuration.
+    - how to configure multiple learning rate schedulers (as a chained pipeline) in


Suggested change

- how to configure multiple learning rate schedulers (as a chained pipeline) in

- how to configure multiple learning rate schedulers, as a chained pipeline, in

Co-authored-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]>

Signed-off-by: sven1977 <[email protected]>

…_redo_new_api_stack_migration_guide

…n_guide' into docs_redo_new_api_stack_migration_guide # Conflicts: # doc/source/rllib/rllib-new-api-stack.rst

Signed-off-by: sven1977 <[email protected]>

Signed-off-by: ujjawal-khare <[email protected]>

sven1977 added 2 commits September 21, 2024 22:29

wip

ffd4622

Signed-off-by: sven1977 <[email protected]>

wip

40b6876

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from ArturNiederfahrenhorst, maxpumperla, simonsays1980 and a team as code owners September 21, 2024 21:32

sven1977 assigned simonsays1980 Sep 21, 2024

sven1977 added 2 commits September 22, 2024 21:03

Merge branch 'master' of https://github.com/ray-project/ray into docs…

582a243

…_redo_new_api_stack_migration_guide

wip

4fb85e8

Signed-off-by: sven1977 <[email protected]>

simonsays1980 approved these changes Sep 23, 2024

View reviewed changes

sven1977 added 5 commits September 24, 2024 10:28

Merge branch 'master' of https://github.com/ray-project/ray into docs…

1193c36

…_redo_new_api_stack_migration_guide

wip

352d40a

Signed-off-by: sven1977 <[email protected]>

wip

e76b472

Signed-off-by: sven1977 <[email protected]>

fix

1272959

Signed-off-by: sven1977 <[email protected]>

fix

83c9ffe

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) September 24, 2024 17:22

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 24, 2024

fix

ea091fa

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge September 25, 2024 09:22

sven1977 assigned angelinalg Sep 25, 2024

angelinalg reviewed Sep 25, 2024

View reviewed changes

angelinalg approved these changes Sep 25, 2024

View reviewed changes

Apply suggestions from code review

08625b7

Co-authored-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]>

sven1977 enabled auto-merge (squash) September 26, 2024 06:34

sven1977 added 3 commits September 26, 2024 11:33

fix

a700bf7

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

093df09

…_redo_new_api_stack_migration_guide

Merge remote-tracking branch 'origin/docs_redo_new_api_stack_migratio…

3c2852c

…n_guide' into docs_redo_new_api_stack_migration_guide # Conflicts: # doc/source/rllib/rllib-new-api-stack.rst

github-actions bot disabled auto-merge September 26, 2024 09:34

sven1977 enabled auto-merge (squash) September 26, 2024 09:41

fix

6b1d250

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge September 26, 2024 09:49

sven1977 enabled auto-merge (squash) September 26, 2024 10:58

sven1977 merged commit eebfdc2 into ray-project:master Sep 26, 2024
6 checks passed

sven1977 deleted the docs_redo_new_api_stack_migration_guide branch September 26, 2024 12:42

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

99ef12a

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

a89427d

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

1c47634

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

33cfcca

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

33f1829

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

ae2fd31

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

ae03599

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

55397ea

Signed-off-by: ujjawal-khare <[email protected]>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; docs] New API stack migration guide. (ray-project#47779)

8b67dc6

Signed-off-by: ujjawal-khare <[email protected]>


		.. testcode::

		# RolloutWorkers have been re-written to EnvRunners:

		)


		In case you were using the `observation_filter` setting, perform the following translations:


		Keep in mind that due to its alpha nature, when using the new stack, you might run into issues and encounter instabilities.
		Keep in mind that due to its alpha nature, when using the new stack, you might run into

	:ref:`See here for more details on how to write a custom Learner <learner-guide>`.
	See :ref:`Learner <learner-guide>` for details on how to write a custom Learner .

	Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground
	Starting in Ray 2.10, you can opt-in to the alpha version of the "new API stack", a fundamental overhaul from the ground


		:ref:`See here for more details on how to write a custom Learner <learner-guide>`.

		Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__

	Here are also helpful example scripts on `how to write a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__
	The following example scripts show how to write:
	- `a simple custom loss function <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/custom_loss_fn_simple.py>`__

	and `how to write a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__.
	- `a custom Learner with 2 optimizers and different learning rates for each <https://github.com/ray-project/ray/blob/master/rllib/examples/learners/separate_vf_lr_and_optimizer.py>`__.

	Note that the Policy class is no longer supported in the new API stack. This class used to hold a
	Note that the new API stack doesn't support the Policy class. In the old stack, this class holds a

	neural network (now moved into :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`),
	neural network, which is the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` in the new API stack,


		The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is work in progress and linked from here shortly.

		In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the

	In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the
	The following are some examples on how to write ConnectorV2 pieces for the

	1) `Example on how to perform observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.
	1) `Observation frame-stacking <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/frame_stacking.py>`__.

	1) `Example on how to add the most recent action and reward to the RLModule's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.
	1) `Add the most recent action and reward to the RL Module's input <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/prev_actions_prev_rewards.py>`__.

	1) `Example on how to do mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.
	1) `Mean-std filtering on all observations <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/mean_std_filtering.py>`__.

	1) `Example on how to flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.
	1) `Flatten any complex observation space to a 1D space <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/flatten_observations_dict_space.py>`__.

	This section describes in detail what the new API stack is and why you should migrate to it
	This section describes the new API stack and why you should migrate to it

	(in case you have old API stack custom code).
	if you have old API stack custom code.

	that you can run in the new stack (with a focus on offline RL).
	that you can run in the new stack, with a focus on offline RL.

	Also know that you are able to continue using your custom classes and setups
	You can continue using custom classes and setups

[RLlib; docs] New API stack migration guide. #47779

[RLlib; docs] New API stack migration guide. #47779

Conversation

sven1977 commented Sep 21, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angelinalg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angelinalg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angelinalg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Sep 21, 2024 •

edited

Loading

	slow as well. You can inspect the object store during your experiment
	slowly as well. You can inspect the object store during your experiment

	via a call to ray memory on your headnode, and by using the ray
	via a call to Ray memory on your head node, and by using the Ray

	learning rate schedulers need arguments to be configured, i.e. you might
	learning rate schedulers need arguments to be configured, that is, you might

	If True, TFPolicy will handle more than one loss/optimizer.
	If True, TFPolicy handles more than one loss or optimizer.

	- how to configure multiple learning rate schedulers (as a chained pipeline) in
	- how to configure multiple learning rate schedulers, as a chained pipeline, in