[RLlib] Deprecate (delete) contrib folder. (ray-project#30992)

Signed-off-by: tmynn <[email protected]>
ju2ez · Jan 25, 2023 · bd3877d · bd3877d
1 parent 8f1568f
commit bd3877d
Show file tree

Hide file tree

Showing 18 changed files with 188 additions and 199 deletions.
diff --git a/doc/source/rllib/index.rst b/doc/source/rllib/index.rst
@@ -126,7 +126,7 @@ click on the dropdowns below:
 
     *  Model-based / Meta-learning / Offline
 
-       -  |pytorch| :ref:`Single-Player AlphaZero (contrib/AlphaZero) <alphazero>`
+       -  |pytorch| :ref:`Single-Player AlphaZero (AlphaZero) <alphazero>`
 
        -  |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) <maml>`
 
@@ -139,16 +139,16 @@ click on the dropdowns below:
     *  Multi-agent
 
        -  |pytorch| :ref:`QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) <qmix>`
-       -  |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) <maddpg>`
+       -  |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (MADDPG) <maddpg>`
 
     *  Offline
 
        -  |pytorch| |tensorflow| :ref:`Advantage Re-Weighted Imitation Learning (MARWIL) <marwil>`
 
     *  Contextual bandits
 
-       -  |pytorch| :ref:`Linear Upper Confidence Bound (contrib/LinUCB) <lin-ucb>`
-       -  |pytorch| :ref:`Linear Thompson Sampling (contrib/LinTS) <lints>`
+       -  |pytorch| :ref:`Linear Upper Confidence Bound (LinUCB) <lin-ucb>`
+       -  |pytorch| :ref:`Linear Thompson Sampling (LinTS) <lints>`
 
     *  Exploration-based plug-ins (can be combined with any algo)
 

diff --git a/doc/source/rllib/rllib-dev.rst b/doc/source/rllib/rllib-dev.rst
@@ -52,54 +52,54 @@ A number of training run results are available in the `rl-experiments repo <http
 Contributing Algorithms
 -----------------------
 
-These are the guidelines for merging new algorithms into RLlib:
+These are the guidelines for merging new algorithms (`rllib/algorithms <https://github.com/ray-project/ray/tree/master/rllib/algorithms>`__) into RLlib:
 
-* Contributed algorithms (`rllib/contrib <https://github.com/ray-project/ray/tree/master/rllib/contrib>`__):
+* Contributed algorithms:
     - must subclass Algorithm and implement the ``step()`` method
     - must include a lightweight test (`example <https://github.com/ray-project/ray/blob/6bb110393008c9800177490688c6ed38b2da52a9/test/jenkins_tests/run_multi_node_tests.sh#L45>`__) to ensure the algorithm runs
     - should include tuned hyperparameter examples and documentation
     - should offer functionality not present in existing algorithms
 
-* Fully integrated algorithms (`rllib/agents <https://github.com/ray-project/ray/tree/master/rllib/agents>`__) have the following additional requirements:
+* Fully integrated algorithms have the following additional requirements:
     - must fully implement the Algorithm API
     - must offer substantial new functionality not possible to add to other algorithms
     - should support custom models and preprocessors
     - should use RLlib abstractions and support distributed execution
 
 Both integrated and contributed algorithms ship with the ``ray`` PyPI package, and are tested as part of Ray's automated tests. The main difference between contributed and fully integrated algorithms is that the latter will be maintained by the Ray team to a much greater extent with respect to bugs and integration with RLlib features.
 
-How to add an algorithm to ``contrib``
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-It takes just two changes to add an algorithm to `contrib <https://github.com/ray-project/ray/tree/master/rllib/contrib>`__. A minimal example can be found `here <https://github.com/ray-project/ray/tree/master/rllib/contrib/random_agent/random_agent.py>`__. First, subclass `Algorithm <https://github.com/ray-project/ray/commits/master/rllib/algorithms/algorithm.py>`__ and implement the ``_init`` and ``step`` methods:
+How to add an algorithm to ``rllib/algorithms``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+It takes just two changes to add an algorithm to `algorithms <https://github.com/ray-project/ray/tree/master/rllib/algorithms>`__. A minimal example can be found `here <https://github.com/ray-project/ray/tree/master/rllib/contrib/random_agent/random_agent.py>`__. First, subclass `Algorithm <https://github.com/ray-project/ray/commits/master/rllib/algorithms/algorithm.py>`__ and implement the ``_init`` and ``step`` methods:
 
-.. literalinclude:: ../../../rllib/contrib/random_agent/random_agent.py
+.. literalinclude:: ../../../rllib/algorithms/random_agent/random_agent.py
    :language: python
    :start-after: __sphinx_doc_begin__
    :end-before: __sphinx_doc_end__
 
-Second, register the algorithm with a name in `contrib/registry.py <https://github.com/ray-project/ray/blob/master/rllib/contrib/registry.py>`__.
+Second, register the algorithm with a name in `rllib/algorithms/registry.py <https://github.com/ray-project/ray/blob/master/rllib/algorithms/registry.py>`__.
 
 .. code-block:: python
 
     def _import_random_agent():
-        from ray.rllib.contrib.random_agent.random_agent import RandomAgent
+        from ray.rllib.algorithms.random_agent.random_agent import RandomAgent
         return RandomAgent
 
     def _import_random_agent_2():
-        from ray.rllib.contrib.random_agent_2.random_agent_2 import RandomAgent2
+        from ray.rllib.algorithms.random_agent_2.random_agent_2 import RandomAgent2
         return RandomAgent2
 
-    CONTRIBUTED_ALGORITHMS = {
-        "contrib/RandomAgent": _import_random_trainer,
-        "contrib/RandomAgent2": _import_random_trainer_2,
+    ALGORITHMS = {
+        "RandomAgent": _import_random_agent,
+        "RandomAgent2": _import_random_agent_2,
         # ...
     }
 
 After registration, you can run and visualize training progress using ``rllib train``:
 
 .. code-block:: bash
 
-    rllib train --run=contrib/RandomAgent --env=CartPole-v1
+    rllib train --run=RandomAgent --env=CartPole-v1
     tensorboard --logdir=~/ray_results
 
 Debugging your Algorithms

diff --git a/rllib/BUILD b/rllib/BUILD
@@ -1119,6 +1119,15 @@ py_test(
     srcs = ["algorithms/r2d2/tests/test_r2d2.py"]
 )
 
+# RandomAgent
+py_test(
+    name = "test_random_agent",
+    main = "algorithms/random_agent/random_agent.py",
+    tags = ["team:rllib", "algorithms_dir"],
+    size = "small",
+    srcs = ["algorithms/random_agent/random_agent.py"]
+)
+
 # RNNSAC
 py_test(
     name = "test_rnnsac",
@@ -1167,18 +1176,6 @@ py_test(
     srcs = ["algorithms/td3/tests/test_td3.py"]
 )
 
-# --------------------------------------------------------------------
-# contrib Algorithms
-# --------------------------------------------------------------------
-
-py_test(
-    name = "random_agent",
-    tags = ["team:rllib", "algorithms_dir"],
-    main = "contrib/random_agent/random_agent.py",
-    size = "small",
-    srcs = ["contrib/random_agent/random_agent.py"]
-)
-
 
 # --------------------------------------------------------------------
 # Memory leak tests

diff --git a/rllib/README.rst b/rllib/README.rst
@@ -74,7 +74,7 @@ Model-free On-policy RL:
 - `Importance Weighted Actor-Learner Architecture (IMPALA) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#impala>`__   
 - `Advantage Actor-Critic (A2C, A3C) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#a3c>`__ 
 - `Vanilla Policy Gradient (PG) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#pg>`__ 
-- `Model-agnostic Meta-Learning (contrib/MAML) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#maml>`__ 
+- `Model-agnostic Meta-Learning (MAML) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#maml>`__
 
 Model-free Off-policy RL:
 
@@ -86,7 +86,7 @@ Model-free Off-policy RL:
 
 Model-based RL: 
 
-- `Image-only Dreamer (contrib/Dreamer) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#dreamer>`__ 
+- `Image-only Dreamer (Dreamer) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#dreamer>`__
 - `Model-Based Meta-Policy-Optimization (MB-MPO) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#mbmpo>`__ 
 
 Derivative-free algorithms: 
@@ -113,8 +113,8 @@ Multi-agent:
 Others:  
 
 - `Single-Player Alpha Zero (AlphaZero)  <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#alphazero>`__
-- `Curiosity (ICM: Intrinsic Curiosity Module) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#curiosity>`__ 
-- `Random encoders (contrib/RE3) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#re3>`__ 
+- `Curiosity (ICM: Intrinsic Curiosity Module) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#curiosity>`__
+- `Random encoders (RE3) <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#re3>`__
 - `Fully Independent Learning <https://docs.ray.io/en/master/rllib/rllib-algorithms.html#fil>`__ 
 
 A list of all the algorithms can be found `here <https://docs.ray.io/en/master/rllib/rllib-algorithms.html>`__ .  

diff --git a/rllib/__init__.py b/rllib/__init__.py
@@ -29,34 +29,14 @@ def _setup_logger():
 
 
 def _register_all():
-    from ray.rllib.algorithms.algorithm import Algorithm
     from ray.rllib.algorithms.registry import ALGORITHMS, _get_algorithm_class
-    from ray.rllib.contrib.registry import CONTRIBUTED_ALGORITHMS
 
-    for key, get_trainable_class_and_config in list(ALGORITHMS.items()) + list(
-        CONTRIBUTED_ALGORITHMS.items()
-    ):
+    for key, get_trainable_class_and_config in ALGORITHMS.items():
         register_trainable(key, get_trainable_class_and_config()[0])
 
     for key in ["__fake", "__sigmoid_fake_data", "__parameter_tuning"]:
         register_trainable(key, _get_algorithm_class(key))
 
-    def _see_contrib(name):
-        """Returns dummy agent class warning algo is in contrib/."""
-
-        class _SeeContrib(Algorithm):
-            def setup(self, config):
-                raise NameError("Please run `contrib/{}` instead.".format(name))
-
-        return _SeeContrib
-
-    # Also register the aliases minus contrib/ to give a good error message.
-    for key in list(CONTRIBUTED_ALGORITHMS.keys()):
-        assert key.startswith("contrib/")
-        alias = key.split("/", 1)[1]
-        if alias not in ALGORITHMS:
-            register_trainable(alias, _see_contrib(alias))
-
 
 _setup_logger()
 

diff --git a/rllib/algorithms/random_agent/__init__.py b/rllib/algorithms/random_agent/__init__.py
@@ -0,0 +1,9 @@
+from ray.rllib.algorithms.random_agent.random_agent import (
+    RandomAgent,
+    RandomAgentConfig,
+)
+
+__all__ = [
+    "RandomAgent",
+    "RandomAgentConfig",
+]
diff --git a/rllib/algorithms/random_agent/random_agent.py b/rllib/algorithms/random_agent/random_agent.py
@@ -0,0 +1,100 @@
+import numpy as np
+from typing import Optional
+
+from ray.rllib.algorithms.algorithm import Algorithm
+from ray.rllib.algorithms.algorithm_config import AlgorithmConfig, NotProvided
+from ray.rllib.utils.annotations import override
+
+
+class RandomAgentConfig(AlgorithmConfig):
+    """Defines a configuration class from which a RandomAgent Algorithm can be built.
+
+    Example:
+        >>> from ray.rllib.algorithms.random_agent import RandomAgentConfig
+        >>> config = RandomAgentConfig().rollouts(rollouts_per_iteration=20)
+        >>> print(config.to_dict()) # doctest: +SKIP
+        >>> # Build an Algorithm object from the config and run 1 training iteration.
+        >>> algo = config.build(env="CartPole-v1")
+        >>> algo.train() # doctest: +SKIP
+    """
+
+    def __init__(self, algo_class=None):
+        """Initializes a RandomAgentConfig instance."""
+        super().__init__(algo_class=algo_class or RandomAgent)
+
+        self.rollouts_per_iteration = 10
+
+    def rollouts(
+        self,
+        *,
+        rollouts_per_iteration: Optional[int] = NotProvided,
+        **kwargs,
+    ) -> "RandomAgentConfig":
+        """Sets the rollout configuration.
+
+        Args:
+            rollouts_per_iteration: How many episodes to run per training iteration.
+
+        Returns:
+            This updated AlgorithmConfig object.
+        """
+        super().rollouts(**kwargs)
+
+        if rollouts_per_iteration is not NotProvided:
+            self.rollouts_per_iteration = rollouts_per_iteration
+
+        return self
+
+
+# fmt: off
+# __sphinx_doc_begin__
+class RandomAgent(Algorithm):
+    """Algo that produces random actions and never learns."""
+
+    @classmethod
+    @override(Algorithm)
+    def get_default_config(cls) -> AlgorithmConfig:
+        config = AlgorithmConfig()
+        config.rollouts_per_iteration = 10
+        return config
+
+    @override(Algorithm)
+    def _init(self, config, env_creator):
+        self.env = env_creator(config["env_config"])
+
+    @override(Algorithm)
+    def step(self):
+        rewards = []
+        steps = 0
+        for _ in range(self.config.rollouts_per_iteration):
+            self.env.reset()
+            done = False
+            reward = 0.0
+            while not done:
+                action = self.env.action_space.sample()
+                _, r, done, _ = self.env.step(action)
+                reward += r
+                steps += 1
+            rewards.append(reward)
+        return {
+            "episode_reward_mean": np.mean(rewards),
+            "timesteps_this_iter": steps,
+        }
+# __sphinx_doc_end__
+
+
+if __name__ == "__main__":
+    # Define a config object.
+    config = (
+        RandomAgentConfig()
+        .environment("CartPole-v1")
+        .rollouts(rollouts_per_iteration=10)
+    )
+    # Build the agent.
+    algo = config.build()
+    # "Train" one iteration.
+    result = algo.train()
+    assert result["episode_reward_mean"] > 10, result
+    algo.stop()
+
+    print("Test: OK")