ray-project · sven1977 · Jun 12, 2024 · May 16, 2024 · May 16, 2024 · May 16, 2024
diff --git a/.vale/styles/config/vocabularies/RLlib/accept.txt b/.vale/styles/config/vocabularies/RLlib/accept.txt
@@ -9,9 +9,12 @@ config
 (IMPALA|impala)
 hyperparameters?
 MARLModule
+MLAgents
+multiagent
 postprocessing
 (PPO|ppo)
 [Pp]y[Tt]orch
+pragmas?
 (RL|rl)lib
 RLModule
 rollout

@@ -167,7 +167,7 @@ Feature Overview
 
         **RLlib Algorithms**
         ^^^
-        Check out the many available RL algorithms of RLlib for model-free and model-based
+        See the many available RL algorithms of RLlib for model-free and model-based
         RL, on-policy and off-policy training, multi-agent RL, and more.
         +++
         .. button-ref:: rllib-algorithms-doc

@@ -114,7 +114,7 @@ The following figure shows *synchronous sampling*, the simplest of `these patter
 
 RLlib uses `Ray actors <actors.html>`__ to scale training from a single core to many thousands of cores in a cluster.
 You can `configure the parallelism <rllib-training.html#specifying-resources>`__ used for training by changing the ``num_env_runners`` parameter.
-Check out our `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
+See this `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
 
 
 RL Modules

@@ -23,7 +23,7 @@ which sit inside a :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`
 
     **A typical RLlib EnvRunnerGroup setup inside an RLlib Algorithm:** Each :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` contains
     exactly one local :py:class:`~ray.rllib.env.env_runner.EnvRunner` object and N ray remote
-    :py:class:`~ray.rllib.env.env_runner.EnvRunner` (ray actors).
+    :py:class:`~ray.rllib.env.env_runner.EnvRunner` (Ray actors).
     The workers contain a policy map (with one or more policies), and - in case a simulator
     (env) is available - a vectorized :py:class:`~ray.rllib.env.base_env.BaseEnv`
     (containing M sub-environments) and a :py:class:`~ray.rllib.evaluation.sampler.SamplerInput` (either synchronous or asynchronous) which controls

@@ -19,87 +19,31 @@ implement `custom training workflows (example) <https://github.com/ray-project/r
 Curriculum Learning
 ~~~~~~~~~~~~~~~~~~~
 
-In Curriculum learning, the environment can be set to different difficulties
-(or "tasks") to allow for learning to progress through controlled phases (from easy to
-more difficult). RLlib comes with a basic curriculum learning API utilizing the
-`TaskSettableEnv <https://github.com/ray-project/ray/blob/master/rllib/env/apis/task_settable_env.py>`__ environment API.
-Your environment only needs to implement the `set_task` and `get_task` methods
-for this to work. You can then define an `env_task_fn` in your config,
-which receives the last training results and returns a new task for the env to be set to:
-
-.. TODO move to doc_code and make it use algo configs.
-.. code-block:: python
-
-    from ray.rllib.env.apis.task_settable_env import TaskSettableEnv
-
-    class MyEnv(TaskSettableEnv):
-        def get_task(self):
-            return self.current_difficulty
-
-        def set_task(self, task):
-            self.current_difficulty = task
-
-    def curriculum_fn(train_results, task_settable_env, env_ctx):
-        # Very simple curriculum function.
-        current_task = task_settable_env.get_task()
-        new_task = current_task + 1
-        return new_task
-
-    # Setup your Algorithm's config like so:
-    config = {
-        "env": MyEnv,
-        "env_task_fn": curriculum_fn,
-    }
-    # Train using `Tuner.fit()` or `Algorithm.train()` and the above config stub.
-    # ...
-
-There are two more ways to use the RLlib's other APIs to implement
-`curriculum learning <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`__.
-
-Use the Algorithm API and update the environment between calls to ``train()``.
-This example shows the algorithm being run inside a Tune function.
-This is basically the same as what the built-in `env_task_fn` API described above
-already does under the hood, but allows you to do even more customizations to your
-training loop.
-
-.. TODO move to doc_code and make it use algo configs.
-.. code-block:: python
-
-    import ray
-    from ray import train, tune
-    from ray.rllib.algorithms.ppo import PPO
-
-    def train_fn(config):
-        algo = PPO(config=config, env=YourEnv)
-        while True:
-            result = algo.train()
-            train.report(result)
-            if result["env_runners"]["episode_return_mean"] > 200:
-                task = 2
-            elif result["env_runners"]["episode_return_mean"] > 100:
-                task = 1
-            else:
-                task = 0
-            algo.workers.foreach_worker(
-                lambda ev: ev.foreach_env(
-                    lambda env: env.set_task(task)))
-
-    num_gpus = 0
-    num_env_runners = 2
+In curriculum learning, you can set the environment to different difficulties
+throughout the training process. This setting allows the algorithm to learn how to solve
+the actual and final problem incrementally, by interacting with and exploring in more and
+more difficult phases.
+Normally, such a curriculum starts with setting the environment to an easy level and
+then - as training progresses - transitions more toward a harder-to-solve difficulty.
+See the `Reverse Curriculum Generation for Reinforcement Learning Agents <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`_ blog post
+for another example of how you can do curriculum learning.
+
+RLlib's Algorithm and custom callbacks APIs allow for implementing any arbitrary
+curricula. This `example script <https://github.com/ray-project/ray/blob/master/rllib/examples/curriculum/curriculum_learning.py>`__ introduces
+the basic concepts you need to understand.
+
+First, define some env options. This example uses the `FrozenLake-v1` environment,
+a grid world, whose map is fully customizable.  Three tasks of different env difficulties
+are represented by slightly different maps that the agent has to navigate.
+
+.. literalinclude:: ../../../rllib/examples/curriculum/curriculum_learning.py
+   :language: python
+   :start-after: __curriculum_learning_example_env_options__
+   :end-before: __END_curriculum_learning_example_env_options__
 
-    ray.init()
-    tune.Tuner(
-        tune.with_resources(train_fn, resources=tune.PlacementGroupFactory(
-            [{"CPU": 1}, {"GPU": num_gpus}] + [{"CPU": 1}] * num_env_runners
-        ),)
-        param_space={
-            "num_gpus": num_gpus,
-            "num_env_runners": num_env_runners,
-        },
-    ).fit()
+Then, define the central piece controlling the curriculum, which is a custom callbacks class
+overriding the :py:meth:`~ray.rllib.algorithms.callbacks.Callbacks.on_train_result`.
 
-You could also use RLlib's callbacks API to update the environment on new training
-results:
 
 .. TODO move to doc_code and make it use algo configs.
 .. code-block:: python

@@ -9,7 +9,7 @@ Algorithms
 
 .. tip::
 
-    Check out the `environments <rllib-env.html>`__ page to learn more about different environment types.
+    See the `environments <rllib-env.html>`__ page to learn more about different environment types.
 
 Available Algorithms - Overview
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@@ -11,7 +11,7 @@ RLlib works with several different types of environments, including `Farama-Foun
 
 .. tip::
 
-    Not all environments work with all algorithms. Check out the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.
+    Not all environments work with all algorithms. See the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.
 
 .. image:: images/rllib-envs.svg