Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Cleanup examples folder #13. Fix main examples docs page for RLlib. #45382

Merged
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
cb937c6
wip
sven1977 May 16, 2024
c931bed
wip
sven1977 May 16, 2024
02d3d04
wip
sven1977 May 16, 2024
3ada50a
wip
sven1977 May 16, 2024
bccdde6
Merge branch 'master' of https://github.com/ray-project/ray into algo…
sven1977 May 16, 2024
496c5ee
wip
sven1977 May 16, 2024
edd0a91
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 May 16, 2024
4cdebb3
Merge branch 'algorithm_config_dissolve_resources_method' into cleanu…
sven1977 May 16, 2024
68aa7ba
wip
sven1977 May 16, 2024
794b960
wip
sven1977 May 16, 2024
3a6d05e
Merge branch 'algorithm_config_dissolve_resources_method' into cleanu…
sven1977 May 16, 2024
03be7f5
wip
sven1977 May 16, 2024
e6a8a2e
wip
sven1977 May 16, 2024
d65082f
wip
sven1977 May 16, 2024
b797141
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Jun 3, 2024
5d1fa21
wip
sven1977 Jun 4, 2024
5322241
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Jun 10, 2024
d0995ae
wip
sven1977 Jun 10, 2024
0916ce7
wip
sven1977 Jun 10, 2024
f36f7dc
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Jun 10, 2024
23fcf8d
Apply suggestions from code review
sven1977 Jun 10, 2024
8e2afcc
Apply suggestions from code review
sven1977 Jun 10, 2024
8966d52
Apply suggestions from code review
sven1977 Jun 10, 2024
cbc7f5b
fix
sven1977 Jun 10, 2024
9d509c3
Merge remote-tracking branch 'origin/cleanup_examples_folder_13_folde…
sven1977 Jun 10, 2024
11b207e
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Jun 11, 2024
ed21e41
fix
sven1977 Jun 11, 2024
3ea64bf
fix
sven1977 Jun 11, 2024
f192bc3
fix
sven1977 Jun 11, 2024
2dbe142
fix
sven1977 Jun 11, 2024
73820dc
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Jun 11, 2024
5172242
fix
sven1977 Jun 11, 2024
80566dd
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Jun 12, 2024
41102e5
fix
sven1977 Jun 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .vale/styles/config/vocabularies/RLlib/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ config
(IMPALA|impala)
hyperparameters?
MARLModule
MLAgents
multiagent
postprocessing
(PPO|ppo)
[Pp]y[Tt]orch
pragmas?
(RL|rl)lib
RLModule
rollout
Expand Down
1 change: 1 addition & 0 deletions doc/source/rllib/images/sigils/new-api-stack.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/source/rllib/images/sigils/old-api-stack.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ Feature Overview

**RLlib Algorithms**
^^^
Check out the many available RL algorithms of RLlib for model-free and model-based
See the many available RL algorithms of RLlib for model-free and model-based
RL, on-policy and off-policy training, multi-agent RL, and more.
+++
.. button-ref:: rllib-algorithms-doc
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/key-concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ The following figure shows *synchronous sampling*, the simplest of `these patter

RLlib uses `Ray actors <actors.html>`__ to scale training from a single core to many thousands of cores in a cluster.
You can `configure the parallelism <rllib-training.html#specifying-resources>`__ used for training by changing the ``num_env_runners`` parameter.
Check out our `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
See this `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scaling guide also needs to be overhauled.



RL Modules
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/package_ref/evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ which sit inside a :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`

**A typical RLlib EnvRunnerGroup setup inside an RLlib Algorithm:** Each :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` contains
exactly one local :py:class:`~ray.rllib.env.env_runner.EnvRunner` object and N ray remote
:py:class:`~ray.rllib.env.env_runner.EnvRunner` (ray actors).
:py:class:`~ray.rllib.env.env_runner.EnvRunner` (Ray actors).
The workers contain a policy map (with one or more policies), and - in case a simulator
(env) is available - a vectorized :py:class:`~ray.rllib.env.base_env.BaseEnv`
(containing M sub-environments) and a :py:class:`~ray.rllib.evaluation.sampler.SamplerInput` (either synchronous or asynchronous) which controls
Expand Down
102 changes: 23 additions & 79 deletions doc/source/rllib/rllib-advanced-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,87 +19,31 @@ implement `custom training workflows (example) <https://github.com/ray-project/r
Curriculum Learning
~~~~~~~~~~~~~~~~~~~

In Curriculum learning, the environment can be set to different difficulties
(or "tasks") to allow for learning to progress through controlled phases (from easy to
more difficult). RLlib comes with a basic curriculum learning API utilizing the
`TaskSettableEnv <https://github.com/ray-project/ray/blob/master/rllib/env/apis/task_settable_env.py>`__ environment API.
Your environment only needs to implement the `set_task` and `get_task` methods
for this to work. You can then define an `env_task_fn` in your config,
which receives the last training results and returns a new task for the env to be set to:

.. TODO move to doc_code and make it use algo configs.
.. code-block:: python

from ray.rllib.env.apis.task_settable_env import TaskSettableEnv

class MyEnv(TaskSettableEnv):
def get_task(self):
return self.current_difficulty

def set_task(self, task):
self.current_difficulty = task

def curriculum_fn(train_results, task_settable_env, env_ctx):
# Very simple curriculum function.
current_task = task_settable_env.get_task()
new_task = current_task + 1
return new_task

# Setup your Algorithm's config like so:
config = {
"env": MyEnv,
"env_task_fn": curriculum_fn,
}
# Train using `Tuner.fit()` or `Algorithm.train()` and the above config stub.
# ...

There are two more ways to use the RLlib's other APIs to implement
`curriculum learning <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`__.

Use the Algorithm API and update the environment between calls to ``train()``.
This example shows the algorithm being run inside a Tune function.
This is basically the same as what the built-in `env_task_fn` API described above
already does under the hood, but allows you to do even more customizations to your
training loop.

.. TODO move to doc_code and make it use algo configs.
.. code-block:: python

import ray
from ray import train, tune
from ray.rllib.algorithms.ppo import PPO

def train_fn(config):
algo = PPO(config=config, env=YourEnv)
while True:
result = algo.train()
train.report(result)
if result["env_runners"]["episode_return_mean"] > 200:
task = 2
elif result["env_runners"]["episode_return_mean"] > 100:
task = 1
else:
task = 0
algo.workers.foreach_worker(
lambda ev: ev.foreach_env(
lambda env: env.set_task(task)))

num_gpus = 0
num_env_runners = 2
In curriculum learning, you can set the environment to different difficulties
throughout the training process. This setting allows the algorithm to learn how to solve
the actual and final problem incrementally, by interacting with and exploring in more and
more difficult phases.
Normally, such a curriculum starts with setting the environment to an easy level and
then - as training progresses - transitions more toward a harder-to-solve difficulty.
See the `Reverse Curriculum Generation for Reinforcement Learning Agents <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`_ blog post
for another example of how you can do curriculum learning.

RLlib's Algorithm and custom callbacks APIs allow for implementing any arbitrary
curricula. This `example script <https://github.com/ray-project/ray/blob/master/rllib/examples/curriculum/curriculum_learning.py>`__ introduces
the basic concepts you need to understand.

First, define some env options. This example uses the `FrozenLake-v1` environment,
a grid world, whose map is fully customizable. Three tasks of different env difficulties
are represented by slightly different maps that the agent has to navigate.

.. literalinclude:: ../../../rllib/examples/curriculum/curriculum_learning.py
:language: python
:start-after: __curriculum_learning_example_env_options__
:end-before: __END_curriculum_learning_example_env_options__

ray.init()
tune.Tuner(
tune.with_resources(train_fn, resources=tune.PlacementGroupFactory(
[{"CPU": 1}, {"GPU": num_gpus}] + [{"CPU": 1}] * num_env_runners
),)
param_space={
"num_gpus": num_gpus,
"num_env_runners": num_env_runners,
},
).fit()
Then, define the central piece controlling the curriculum, which is a custom callbacks class
overriding the :py:meth:`~ray.rllib.algorithms.callbacks.Callbacks.on_train_result`.

You could also use RLlib's callbacks API to update the environment on new training
results:

.. TODO move to doc_code and make it use algo configs.
.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Algorithms

.. tip::

Check out the `environments <rllib-env.html>`__ page to learn more about different environment types.
See the `environments <rllib-env.html>`__ page to learn more about different environment types.

Available Algorithms - Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RLlib works with several different types of environments, including `Farama-Foun

.. tip::

Not all environments work with all algorithms. Check out the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.
Not all environments work with all algorithms. See the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.

.. image:: images/rllib-envs.svg

Expand Down
Loading
Loading