Skip to content

Commit

Permalink
[RLlib] Deprecate (delete) contrib folder. (ray-project#30992)
Browse files Browse the repository at this point in the history
Signed-off-by: tmynn <[email protected]>
  • Loading branch information
sven1977 authored and tamohannes committed Jan 25, 2023
1 parent 8f1568f commit bd3877d
Show file tree
Hide file tree
Showing 18 changed files with 188 additions and 199 deletions.
8 changes: 4 additions & 4 deletions doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ click on the dropdowns below:

* Model-based / Meta-learning / Offline

- |pytorch| :ref:`Single-Player AlphaZero (contrib/AlphaZero) <alphazero>`
- |pytorch| :ref:`Single-Player AlphaZero (AlphaZero) <alphazero>`

- |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) <maml>`

Expand All @@ -139,16 +139,16 @@ click on the dropdowns below:
* Multi-agent

- |pytorch| :ref:`QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) <qmix>`
- |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) <maddpg>`
- |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (MADDPG) <maddpg>`

* Offline

- |pytorch| |tensorflow| :ref:`Advantage Re-Weighted Imitation Learning (MARWIL) <marwil>`

* Contextual bandits

- |pytorch| :ref:`Linear Upper Confidence Bound (contrib/LinUCB) <lin-ucb>`
- |pytorch| :ref:`Linear Thompson Sampling (contrib/LinTS) <lints>`
- |pytorch| :ref:`Linear Upper Confidence Bound (LinUCB) <lin-ucb>`
- |pytorch| :ref:`Linear Thompson Sampling (LinTS) <lints>`

* Exploration-based plug-ins (can be combined with any algo)

Expand Down
28 changes: 14 additions & 14 deletions doc/source/rllib/rllib-dev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,54 +52,54 @@ A number of training run results are available in the `rl-experiments repo <http
Contributing Algorithms
-----------------------

These are the guidelines for merging new algorithms into RLlib:
These are the guidelines for merging new algorithms (`rllib/algorithms <https://github.com/ray-project/ray/tree/master/rllib/algorithms>`__) into RLlib:

* Contributed algorithms (`rllib/contrib <https://github.com/ray-project/ray/tree/master/rllib/contrib>`__):
* Contributed algorithms:
- must subclass Algorithm and implement the ``step()`` method
- must include a lightweight test (`example <https://github.com/ray-project/ray/blob/6bb110393008c9800177490688c6ed38b2da52a9/test/jenkins_tests/run_multi_node_tests.sh#L45>`__) to ensure the algorithm runs
- should include tuned hyperparameter examples and documentation
- should offer functionality not present in existing algorithms

* Fully integrated algorithms (`rllib/agents <https://github.com/ray-project/ray/tree/master/rllib/agents>`__) have the following additional requirements:
* Fully integrated algorithms have the following additional requirements:
- must fully implement the Algorithm API
- must offer substantial new functionality not possible to add to other algorithms
- should support custom models and preprocessors
- should use RLlib abstractions and support distributed execution

Both integrated and contributed algorithms ship with the ``ray`` PyPI package, and are tested as part of Ray's automated tests. The main difference between contributed and fully integrated algorithms is that the latter will be maintained by the Ray team to a much greater extent with respect to bugs and integration with RLlib features.

How to add an algorithm to ``contrib``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It takes just two changes to add an algorithm to `contrib <https://github.com/ray-project/ray/tree/master/rllib/contrib>`__. A minimal example can be found `here <https://github.com/ray-project/ray/tree/master/rllib/contrib/random_agent/random_agent.py>`__. First, subclass `Algorithm <https://github.com/ray-project/ray/commits/master/rllib/algorithms/algorithm.py>`__ and implement the ``_init`` and ``step`` methods:
How to add an algorithm to ``rllib/algorithms``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It takes just two changes to add an algorithm to `algorithms <https://github.com/ray-project/ray/tree/master/rllib/algorithms>`__. A minimal example can be found `here <https://github.com/ray-project/ray/tree/master/rllib/contrib/random_agent/random_agent.py>`__. First, subclass `Algorithm <https://github.com/ray-project/ray/commits/master/rllib/algorithms/algorithm.py>`__ and implement the ``_init`` and ``step`` methods:

.. literalinclude:: ../../../rllib/contrib/random_agent/random_agent.py
.. literalinclude:: ../../../rllib/algorithms/random_agent/random_agent.py
:language: python
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__

Second, register the algorithm with a name in `contrib/registry.py <https://github.com/ray-project/ray/blob/master/rllib/contrib/registry.py>`__.
Second, register the algorithm with a name in `rllib/algorithms/registry.py <https://github.com/ray-project/ray/blob/master/rllib/algorithms/registry.py>`__.

.. code-block:: python
def _import_random_agent():
from ray.rllib.contrib.random_agent.random_agent import RandomAgent
from ray.rllib.algorithms.random_agent.random_agent import RandomAgent
return RandomAgent
def _import_random_agent_2():
from ray.rllib.contrib.random_agent_2.random_agent_2 import RandomAgent2
from ray.rllib.algorithms.random_agent_2.random_agent_2 import RandomAgent2
return RandomAgent2
CONTRIBUTED_ALGORITHMS = {
"contrib/RandomAgent": _import_random_trainer,
"contrib/RandomAgent2": _import_random_trainer_2,
ALGORITHMS = {
"RandomAgent": _import_random_agent,
"RandomAgent2": _import_random_agent_2,
# ...
}
After registration, you can run and visualize training progress using ``rllib train``:

.. code-block:: bash
rllib train --run=contrib/RandomAgent --env=CartPole-v1
rllib train --run=RandomAgent --env=CartPole-v1
tensorboard --logdir=~/ray_results
Debugging your Algorithms
Expand Down
21 changes: 9 additions & 12 deletions rllib/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -1119,6 +1119,15 @@ py_test(
srcs = ["algorithms/r2d2/tests/test_r2d2.py"]
)

# RandomAgent
py_test(
name = "test_random_agent",
main = "algorithms/random_agent/random_agent.py",
tags = ["team:rllib", "algorithms_dir"],
size = "small",
srcs = ["algorithms/random_agent/random_agent.py"]
)

# RNNSAC
py_test(
name = "test_rnnsac",
Expand Down Expand Up @@ -1167,18 +1176,6 @@ py_test(
srcs = ["algorithms/td3/tests/test_td3.py"]
)

# --------------------------------------------------------------------
# contrib Algorithms
# --------------------------------------------------------------------

py_test(
name = "random_agent",
tags = ["team:rllib", "algorithms_dir"],
main = "contrib/random_agent/random_agent.py",
size = "small",
srcs = ["contrib/random_agent/random_agent.py"]
)


# --------------------------------------------------------------------
# Memory leak tests
Expand Down
8 changes: 4 additions & 4 deletions rllib/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Model-free On-policy RL:
- `Importance Weighted Actor-Learner Architecture (IMPALA) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#impala>`__
- `Advantage Actor-Critic (A2C, A3C) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#a3c>`__
- `Vanilla Policy Gradient (PG) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#pg>`__
- `Model-agnostic Meta-Learning (contrib/MAML) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#maml>`__
- `Model-agnostic Meta-Learning (MAML) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#maml>`__

Model-free Off-policy RL:

Expand All @@ -86,7 +86,7 @@ Model-free Off-policy RL:

Model-based RL:

- `Image-only Dreamer (contrib/Dreamer) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#dreamer>`__
- `Image-only Dreamer (Dreamer) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#dreamer>`__
- `Model-Based Meta-Policy-Optimization (MB-MPO) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#mbmpo>`__

Derivative-free algorithms:
Expand All @@ -113,8 +113,8 @@ Multi-agent:
Others:

- `Single-Player Alpha Zero (AlphaZero) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#alphazero>`__
- `Curiosity (ICM: Intrinsic Curiosity Module) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#curiosity>`__
- `Random encoders (contrib/RE3) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#re3>`__
- `Curiosity (ICM: Intrinsic Curiosity Module) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#curiosity>`__
- `Random encoders (RE3) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#re3>`__
- `Fully Independent Learning <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#fil>`__

A list of all the algorithms can be found `here <https://docs.ray.io/en/master/rllib/rllib-algorithms.html>`__ .
Expand Down
22 changes: 1 addition & 21 deletions rllib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,34 +29,14 @@ def _setup_logger():


def _register_all():
from ray.rllib.algorithms.algorithm import Algorithm
from ray.rllib.algorithms.registry import ALGORITHMS, _get_algorithm_class
from ray.rllib.contrib.registry import CONTRIBUTED_ALGORITHMS

for key, get_trainable_class_and_config in list(ALGORITHMS.items()) + list(
CONTRIBUTED_ALGORITHMS.items()
):
for key, get_trainable_class_and_config in ALGORITHMS.items():
register_trainable(key, get_trainable_class_and_config()[0])

for key in ["__fake", "__sigmoid_fake_data", "__parameter_tuning"]:
register_trainable(key, _get_algorithm_class(key))

def _see_contrib(name):
"""Returns dummy agent class warning algo is in contrib/."""

class _SeeContrib(Algorithm):
def setup(self, config):
raise NameError("Please run `contrib/{}` instead.".format(name))

return _SeeContrib

# Also register the aliases minus contrib/ to give a good error message.
for key in list(CONTRIBUTED_ALGORITHMS.keys()):
assert key.startswith("contrib/")
alias = key.split("/", 1)[1]
if alias not in ALGORITHMS:
register_trainable(alias, _see_contrib(alias))


_setup_logger()

Expand Down
9 changes: 9 additions & 0 deletions rllib/algorithms/random_agent/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from ray.rllib.algorithms.random_agent.random_agent import (
RandomAgent,
RandomAgentConfig,
)

__all__ = [
"RandomAgent",
"RandomAgentConfig",
]
100 changes: 100 additions & 0 deletions rllib/algorithms/random_agent/random_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import numpy as np
from typing import Optional

from ray.rllib.algorithms.algorithm import Algorithm
from ray.rllib.algorithms.algorithm_config import AlgorithmConfig, NotProvided
from ray.rllib.utils.annotations import override


class RandomAgentConfig(AlgorithmConfig):
"""Defines a configuration class from which a RandomAgent Algorithm can be built.
Example:
>>> from ray.rllib.algorithms.random_agent import RandomAgentConfig
>>> config = RandomAgentConfig().rollouts(rollouts_per_iteration=20)
>>> print(config.to_dict()) # doctest: +SKIP
>>> # Build an Algorithm object from the config and run 1 training iteration.
>>> algo = config.build(env="CartPole-v1")
>>> algo.train() # doctest: +SKIP
"""

def __init__(self, algo_class=None):
"""Initializes a RandomAgentConfig instance."""
super().__init__(algo_class=algo_class or RandomAgent)

self.rollouts_per_iteration = 10

def rollouts(
self,
*,
rollouts_per_iteration: Optional[int] = NotProvided,
**kwargs,
) -> "RandomAgentConfig":
"""Sets the rollout configuration.
Args:
rollouts_per_iteration: How many episodes to run per training iteration.
Returns:
This updated AlgorithmConfig object.
"""
super().rollouts(**kwargs)

if rollouts_per_iteration is not NotProvided:
self.rollouts_per_iteration = rollouts_per_iteration

return self


# fmt: off
# __sphinx_doc_begin__
class RandomAgent(Algorithm):
"""Algo that produces random actions and never learns."""

@classmethod
@override(Algorithm)
def get_default_config(cls) -> AlgorithmConfig:
config = AlgorithmConfig()
config.rollouts_per_iteration = 10
return config

@override(Algorithm)
def _init(self, config, env_creator):
self.env = env_creator(config["env_config"])

@override(Algorithm)
def step(self):
rewards = []
steps = 0
for _ in range(self.config.rollouts_per_iteration):
self.env.reset()
done = False
reward = 0.0
while not done:
action = self.env.action_space.sample()
_, r, done, _ = self.env.step(action)
reward += r
steps += 1
rewards.append(reward)
return {
"episode_reward_mean": np.mean(rewards),
"timesteps_this_iter": steps,
}
# __sphinx_doc_end__


if __name__ == "__main__":
# Define a config object.
config = (
RandomAgentConfig()
.environment("CartPole-v1")
.rollouts(rollouts_per_iteration=10)
)
# Build the agent.
algo = config.build()
# "Train" one iteration.
result = algo.train()
assert result["episode_reward_mean"] > 10, result
algo.stop()

print("Test: OK")
Loading

0 comments on commit bd3877d

Please sign in to comment.