diff --git a/doc/examples/doc_code/runtime_env_example.py b/doc/examples/doc_code/runtime_env_example.py index 0a85398bec5d..c761ae2783a7 100644 --- a/doc/examples/doc_code/runtime_env_example.py +++ b/doc/examples/doc_code/runtime_env_example.py @@ -23,7 +23,7 @@ # __runtime_env_conda_def_end__ # __ray_init_start__ -# Running on the Ray cluster +# Starting a single-node local Ray cluster ray.init(runtime_env=runtime_env) # __ray_init_end__ @@ -49,10 +49,27 @@ def f(): pass @ray.remote -class Actor: +class SomeClass: pass # __per_task_per_actor_start__ +# Invoke a remote task that will run in a specified runtime environment. f.options(runtime_env=runtime_env).remote() -Actor.options(runtime_env=runtime_env).remote() + +# Instantiate an actor that will run in a specified runtime environment. +actor = SomeClass.options(runtime_env=runtime_env).remote() + +# Specify a runtime environment in the task definition. Future invocations via +# `g.remote()` will use this runtime environment unless overridden by using +# `.options()` as above. +@ray.remote(runtime_env=runtime_env) +def g(): + pass + +# Specify a runtime environment in the actor definition. Future instantiations +# via `MyClass.remote()` will use this runtime environment unless overridden by +# using `.options()` as above. +@ray.remote(runtime_env=runtime_env) +class MyClass: + pass # __per_task_per_actor_end__ diff --git a/doc/source/actors.rst b/doc/source/actors.rst index 2b24822f8d98..e0b14b3061a5 100644 --- a/doc/source/actors.rst +++ b/doc/source/actors.rst @@ -313,7 +313,7 @@ sufficient CPU resources and the relevant custom resources. .. tip:: Besides compute resources, you can also specify an environment for an actor to run in, - which can include Python packages, local files, environment variables, and more--see :ref:`Runtime Environments ` for details. + which can include Python packages, local files, environment variables, and more---see :ref:`Runtime Environments ` for details. Terminating Actors diff --git a/doc/source/cluster/commands.rst b/doc/source/cluster/commands.rst index 1373402f0838..10dc75b86294 100644 --- a/doc/source/cluster/commands.rst +++ b/doc/source/cluster/commands.rst @@ -155,6 +155,7 @@ run ``ray attach --help``. # Attach to tmux session on cluster (creates a new one if none available) $ ray attach cluster.yaml --tmux +.. _ray-rsync: Synchronizing files from the cluster (``ray rsync-up/down``) ------------------------------------------------------------ diff --git a/doc/source/cluster/guide.rst b/doc/source/cluster/guide.rst index 1f7428d8632f..c9fa8ef03663 100644 --- a/doc/source/cluster/guide.rst +++ b/doc/source/cluster/guide.rst @@ -26,7 +26,7 @@ any cloud. It will: * provision a new instance/machine using the cloud provider's SDK. * execute shell commands to set up Ray with the provided options. -* (optionally) run any custom, user defined setup commands. (To dynamically set up environments after the cluster has been deployed, you can use :ref:`Runtime Environments`.) +* (optionally) run any custom, user defined setup commands. This can be useful for setting environment variables and installing packages. (To dynamically set up environments after the cluster has been deployed, you can use :ref:`Runtime Environments`.) * Initialize the Ray cluster. * Deploy an autoscaler process. diff --git a/doc/source/dependency-management.rst b/doc/source/handling-dependencies.rst similarity index 51% rename from doc/source/dependency-management.rst rename to doc/source/handling-dependencies.rst index aa2b121641e8..5da0bb216b27 100644 --- a/doc/source/dependency-management.rst +++ b/doc/source/handling-dependencies.rst @@ -1,18 +1,51 @@ -.. _dependency_management: +.. _handling_dependencies: -Dependency Management +Handling Dependencies ===================== -Your Ray project may depend on environment variables, local files, and Python packages. -Ray makes managing these dependencies easy, even when working with a remote cluster. +This page might be useful for you if you're trying to: -You can specify dependencies dynamically at runtime using :ref:`Runtime Environments`. This is useful for quickly iterating on a project with changing dependencies and local code files, or for running jobs, tasks and actors with different environments all on the same Ray cluster. +* Run a distributed Ray library or application. +* Run a distributed Ray script which imports some local files. +* Quickly iterate on a project with changing dependencies and files while running on a Ray cluster. + + +What problem does this page solve? +---------------------------------- + +Your Ray application may have dependencies that exist outside of your Ray script. For example: + +* Your Ray script may import/depend on some Python packages. +* Your Ray script may be looking for some specific environment variables to be available. +* Your Ray script may import some files outside of the script. + + +One frequent problem when running on a cluster is that Ray expects these "dependencies" to exist on each Ray node. If these are not present, you may run into issues such as ``ModuleNotFoundError``, ``FileNotFoundError`` and so on. + + + +To address this problem, you can use Ray's **runtime environments**. + + +Concepts +-------- + +- **Ray Application**. A program including a Ray script that calls ``ray.init()`` and uses Ray tasks or actors. + +- **Dependencies**, or **Environment**. Anything outside of the Ray script that your application needs to run, including files, packages, and environment variables. + +- **Files**: Code files, data files or other files that your Ray application needs to run. + +- **Packages**: External libraries or executables required by your Ray application, often installed via ``pip`` or ``conda``. + +- **Local machine** and **Cluster**. The recommended way to connect to a remote Ray cluster is to use :ref:`Ray Client`, and we will call the machine running Ray Client your *local machine*. + +- **Job**. A period of execution between connecting to a cluster with ``ray.init()`` and disconnecting by calling ``ray.shutdown()`` or exiting the Ray script. + + +.. Alternatively, you can prepare your Ray cluster's environment when your cluster nodes start up, and modify it later from the command line. +.. Packages can be installed using ``setup_commands`` in the Ray Cluster configuration file (:ref:`docs`) and files can be pushed to the cluster using ``ray rsync_up`` (:ref:`docs`). -Alternatively, you can prepare your Ray cluster's environment once, when your cluster nodes start up. This can be -accomplished using ``setup_commands`` in the Ray Cluster launcher; see the :ref:`documentation` for details. -You can still use -runtime environments on top of this, but they will not inherit anything from the base -cluster environment. .. _runtime-environments: @@ -21,16 +54,48 @@ Runtime Environments .. note:: - This API is in beta and may change before becoming stable. + This feature requires a full installation of Ray using ``pip install "ray[default]"``. This feature is available starting with Ray 1.4.0 and is currently only supported on macOS and Linux. -.. note:: +A **runtime environment** describes the dependencies your Ray application needs to run, including :ref:`files, packages, environment variables, and more `. It is installed dynamically on the cluster at runtime. + +Runtime environments let you transition your Ray application from running on your local machine to running on a remote cluster, without any manual environment setup. + +.. + TODO(architkulkarni): run working_dir doc example in CI + +.. code-block:: python + + import ray + import requests + + runtime_env = {"working_dir": "/data/my_files", "pip": ["requests", "pendulum==2.1.2"]} + + # To transition from a local single-node cluster to a remote cluster, + # simply change to ray.init("ray://123.456.7.8:10001", runtime_env=...) + ray.init(runtime_env=runtime_env) - This feature requires a full installation of Ray using ``pip install "ray[default]"``. + @ray.remote() + def f(): + open("my_datafile.txt").read() + return requests.get("https://www.ray.io") -On Mac OS and Linux, Ray 1.4+ supports dynamically setting the runtime environment of tasks, actors, and jobs so that they can depend on different Python libraries (e.g., conda environments, pip dependencies) while all running on the same Ray cluster. +.. literalinclude:: ../examples/doc_code/runtime_env_example.py + :language: python + :start-after: __runtime_env_conda_def_start__ + :end-before: __runtime_env_conda_def_end__ + +Jump to the :ref:`API Reference`. + + +There are two primary scopes for which you can specify a runtime environment: + +* :ref:`Per-Job `, and +* :ref:`Per-Task/Actor, within a job `. -The ``runtime_env`` is a (JSON-serializable) dictionary that can be passed as an option to tasks and actors, and can also be passed to ``ray.init()``. -The runtime environment defines the dependencies required for your workload. +.. _rte-per-job: + +Specifying a Runtime Environment Per-Job +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can specify a runtime environment for your whole job, whether running a script directly on the cluster or using :ref:`Ray Client`: @@ -44,57 +109,200 @@ You can specify a runtime environment for your whole job, whether running a scri .. code-block:: python - # Running on a local machine, connecting to remote cluster using Ray Client + # Connecting to remote cluster using Ray Client ray.init("ray://123.456.7.89:10001", runtime_env=runtime_env) -Or specify per-actor or per-task in the ``@ray.remote()`` decorator or by using ``.options()``: +This will install the dependencies to the remote cluster. Any tasks and actors used in the job will use this runtime environment unless otherwise specified. + +.. note:: + + There are two options for when to install the runtime environment: + + 1. As soon as the job starts (i.e., as soon as ``ray.init()`` is called), the dependencies are eagerly downloaded and installed. + 2. The dependencies are installed only when a task is invoked or an actor is created. + + The default is option 1. To change the behavior to option 2, add ``"eager_install": False`` to the ``runtime_env``. + +.. _rte-per-task-actor: + +Specifying a Runtime Environment Per-Task or Per-Actor +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +You can specify different runtime environments per-actor or per-task using ``.options()`` or the ``@ray.remote()`` decorator: .. literalinclude:: ../examples/doc_code/runtime_env_example.py :language: python :start-after: __per_task_per_actor_start__ :end-before: __per_task_per_actor_end__ -The ``runtime_env`` is a Python dictionary including one or more of the following arguments: +This allows you to have actors and tasks running in their own environments, independent of the surrounding environment. (The surrounding environment could be the job's runtime environment, or the system environment of the cluster.) + +.. warning:: + + Ray does not guarantee compatibility between tasks and actors with conflicting runtime environments. + For example, if an actor whose runtime environment contains a ``pip`` package tries to communicate with an actor with a different version of that package, it can lead to unexpected behavior such as unpickling errors. + +Common Workflows +^^^^^^^^^^^^^^^^ + +This section describes some common use cases for runtime environments. These use cases are not mutually exclusive; all of the options described below can be combined in a single runtime environment. + +.. _workflow-local-files: + +Using Local Files +""""""""""""""""" + +Your Ray application might depend on source files or data files. +For a development workflow, these might live on your local machine, but when it comes time to run things at scale, you will need to get them to your remote cluster. + +The following simple example explains how to get your local files on the cluster. + +.. code-block:: python + + # /path/to/files is a directory on the local machine. + # /path/to/files/hello.txt contains the string "Hello World!" -- ``working_dir`` (str): Specifies the working directory for your job. This can be the path of an existing local directory with a total size of at most 100 MiB. - Alternatively, it can be a URI to a remotely-stored zip file containing the working directory for your job. See the "Remote URIs" section below for more info. - The directory will be cached on the cluster, so the next time you connect with Ray Client you will be able to skip uploading the directory contents. - All Ray workers for your job will be started in their node's local copy of this working directory. + import ray + + # Specify a runtime environment for the entire Ray job + ray.init(runtime_env={"working_dir": "/path/to/files"}) + + # Create a Ray task, which inherits the above runtime env. + @ray.remote + def f(): + # The function will have its working directory changed to its node's + # local copy of /path/to/files. + return open("hello.txt").read() + + print(ray.get(f.remote())) # Hello World! + +.. note:: + The example above is written to run on a local machine, but as for all of these examples, it also works when specifying a Ray cluster to connect to + (e.g., using ``ray.init("ray://123.456.7.89:10001", runtime_env=...)`` or ``ray.init(address="auto", runtime_env=...)``). + +The specified local directory will automatically be pushed to the cluster nodes when ``ray.init()`` is called. + +You can also specify files via a remote cloud storage URI; see :ref:`remote-uris` for details. + +Using ``conda`` or ``pip`` packages +""""""""""""""""""""""""""""""""""" + +Your Ray application might depend on Python packages (for example, ``pendulum`` or ``requests``) via ``import`` statements. + +Ray ordinarily expects all imported packages to be preinstalled on every node of the cluster; in particular, these packages are not automatically shipped from your local machine to the cluster or downloaded from any repository. + +However, using runtime environments you can dynamically specify packages to be automatically downloaded and installed in an isolated virtual environment for your Ray job, or for specific Ray tasks or actors. + +.. code-block:: python + + import ray + import requests + + # This example runs on a local machine, but you can also do + # ray.init(address=..., runtime_env=...) to connect to a cluster. + ray.init(runtime_env={"pip": ["requests"]}) + + @ray.remote + def reqs(): + return requests.get("https://www.ray.io/") + + print(ray.get(reqs.remote())) # + + +You may also specify your ``pip`` dependencies either via a Python list or a ``requirements.txt`` file. +Alternatively, you can specify a ``conda`` environment, either as a Python dictionary or via a ``environment.yml`` file. This conda environment can include ``pip`` packages. +For details, head to the :ref:`API Reference`. + +.. note:: + + The ``ray[default]`` package itself will automatically be installed in the isolated environment. However, if you are using any Ray libraries (for example, Ray Serve), then you will need to specify the library in the runtime environment (e.g. ``runtime_env = {"pip": ["requests", "ray[serve]"}]}``.) + +.. warning:: + + Since the packages in the ``runtime_env`` are installed at runtime, be cautious when specifying ``conda`` or ``pip`` packages whose installations involve building from source, as this can be slow. + +Library Development +""""""""""""""""""" + +Suppose you are developing a library ``my_module`` on Ray. + +A typical iteration cycle will involve + +1. Making some changes to the source code of ``my_module`` +2. Running a Ray script to test the changes, perhaps on a distributed cluster. + +To ensure your local changes show up across all Ray workers and can be imported properly, use the ``py_modules`` field. + +.. code-block:: python + + import ray + import my_module + + ray.init("ray://123.456.7.89:10001", runtime_env={"py_modules": [my_module]}) + + @ray.remote + def test_my_module(): + # No need to import my_module inside this function. + my_module.test() + + ray.get(f.remote()) + +.. _runtime-environments-api-ref: + +API Reference +^^^^^^^^^^^^^ + +The ``runtime_env`` is a Python dictionary including one or more of the following fields: + +- ``working_dir`` (str): Specifies the working directory for the Ray workers. This must either be an existing directory on the local machine with total size at most 100 MiB, or a URI to a remotely-stored zip file containing the working directory for your job. See :ref:`remote-uris` for details. + The specified directory will be downloaded to each node on the cluster, and Ray workers will be started in their node's copy of this directory. - Examples - ``"." # cwd`` - - ``"/code/my_project"`` + - ``"/src/my_project"`` + + - ``"s3://path/to/my_dir.zip"`` + + Note: Setting a local directory per-task or per-actor is currently unsupported; it can only be set per-job (i.e., in ``ray.init()``). + + Note: If your local directory contains a ``.gitignore`` file, the files and paths specified therein will not be uploaded to the cluster. - - ``"gs://bucket/file.zip" # Google Cloud Storage URI`` +- ``py_modules`` (List[str|module]): Specifies Python modules to be available for import in the Ray workers. (For more ways to specify packages, see also the ``pip`` and ``conda`` fields below.) + Each entry must be either (1) a path to a local directory, (2) a URI to a remote zip file (see :ref:`remote-uris` for details), or (3) a Python module object. - Note: Setting this option per-task or per-actor is currently unsupported. + - Examples of entries in the list: - Note: If your working directory contains a `.gitignore` file, the files and paths specified therein will not be uploaded to the cluster. + - ``"."`` -- ``py_modules`` (List[str]): Specifies a list of dependencies for your job. - The list must contain paths to local directories, remote URIs to zip files, or a mix of both. - See the "Remote URIs" section below for more info about using remote zip files. - ``py_modules`` and ``working_dir`` can both be specified in the same ``runtime_env`` Python dictionary. + - ``"/local_dependency/my_module"`` - - Example: ``[".", "/local_dependency/code", "s3://bucket/file.zip"]`` + - ``"s3://bucket/my_module.zip"`` -- ``excludes`` (List[str]): When used with ``working_dir``, specifies a list of files or paths to exclude from being uploaded to the cluster. + - ``my_module # Assumes my_module has already been imported, e.g. via 'import my_module'`` + + The modules will be downloaded to each node on the cluster. + + Note: Setting options (1) and (3) per-task or per-actor is currently unsupported, it can only be set per-job (i.e., in ``ray.init()``). + + Note: For option (1), if your local directory contains a ``.gitignore`` file, the files and paths specified therein will not be uploaded to the cluster. + +- ``excludes`` (List[str]): When used with ``working_dir`` or ``py_modules``, specifies a list of files or paths to exclude from being uploaded to the cluster. This field also supports the pattern-matching syntax used by ``.gitignore`` files: see ``_ for details. - Example: ``["my_file.txt", "path/to/dir", "*.log"]`` -- ``pip`` (List[str] | str): Either a list of pip packages, or a string containing the path to a pip - `“requirements.txt” `_ file. The path may be an absolute path or a relative path. - This will be dynamically installed in the ``runtime_env``. +- ``pip`` (List[str] | str): Either a list of pip `requirements specifiers `_, or a string containing the path to a pip + `“requirements.txt” `_ file. + This will be installed in the Ray workers at runtime. To use a library like Ray Serve or Ray Tune, you will need to include ``"ray[serve]"`` or ``"ray[tune]"`` here. - - Example: ``["requests==1.0.0", "aiohttp"]`` + - Example: ``["requests==1.0.0", "aiohttp", "ray[serve]"]`` - Example: ``"./requirements.txt"`` -- ``conda`` (dict | str): Either (1) a dict representing the conda environment YAML, (2) a string containing the absolute or relative path to a +- ``conda`` (dict | str): Either (1) a dict representing the conda environment YAML, (2) a string containing the path to a `conda “environment.yml” `_ file, or (3) the name of a local conda environment already installed on each node in your cluster (e.g., ``"pytorch_p36"``). In the first two cases, the Ray and Python dependencies will be automatically injected into the environment to ensure compatibility, so there is no need to manually include them. @@ -111,30 +319,40 @@ The ``runtime_env`` is a Python dictionary including one or more of the followin - Example: ``{"OMP_NUM_THREADS": "32", "TF_WARNINGS": "none"}`` -- ``eager_install`` (bool): A boolean indicates whether to install runtime env eagerly before the workers are leased. This flag is set to True by default and only job level is supported now. +- ``eager_install`` (bool): Indicates whether to install the runtime environment on the cluster at ``ray.init()`` time, before the workers are leased. This flag is set to ``True`` by default. + If set to ``False``, the runtime environment will be only installed when the first task is invoked or when the first actor is created. + Currently, specifying this option per-actor or per-task is not supported. - Example: ``{"eager_install": False}`` -The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor, once set. -If a child actor or task specifies a new ``runtime_env``, it will be merged with the parent’s ``runtime_env`` via a simple dict update. -For example, if ``runtime_env["pip"]`` is specified, it will override the ``runtime_env["pip"]`` field of the parent. -The one exception is the field ``runtime_env["env_vars"]``. This field will be `merged` with the ``runtime_env["env_vars"]`` dict of the parent. -This allows for environment variables set in the parent's runtime environment to be automatically propagated to the child, even if new environment variables are set in the child's runtime environment. +Inheritance +""""""""""" -Here are some examples of runtime environments combining multiple options: +The runtime environment is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor once set, unless it is overridden. -.. - TODO(architkulkarni): run working_dir doc example in CI +If an actor or task specifies a new ``runtime_env``, it will override the parent’s ``runtime_env`` (i.e., the parent actor/task's ``runtime_env``, or the job's ``runtime_env`` if there is no parent actor or task) as follows: + +* The ``runtime_env["env_vars"]`` field will be merged with the ``runtime_env["env_vars"]`` field of the parent. + This allows for environment variables set in the parent's runtime environment to be automatically propagated to the child, even if new environment variables are set in the child's runtime environment. +* Every other field in the ``runtime_env`` will be *overridden* by the child, not merged. For example, if ``runtime_env["py_modules"]`` is specified, it will replace the ``runtime_env["py_modules"]`` field of the parent. + +Example: .. code-block:: python - runtime_env = {"working_dir": "/files/my_project", "pip": ["pendulum=2.1.2"]} + # Parent's `runtime_env` + {"pip": ["requests", "chess"], + "env_vars": {"A": "a", "B": "b"}} + + # Child's specified `runtime_env` + {"pip": ["torch", "ray[serve]"], + "env_vars": {"B": "new", "C", "c"}} + + # Child's actual `runtime_env` (merged with parent's) + {"pip": ["torch", "ray[serve]"], + "env_vars": {"A": "a", "B": "new", "C", "c"}} -.. literalinclude:: ../examples/doc_code/runtime_env_example.py - :language: python - :start-after: __runtime_env_conda_def_start__ - :end-before: __runtime_env_conda_def_end__ .. _remote-uris: @@ -320,4 +538,4 @@ Here is a list of different use cases and corresponding URLs: Once you have specified the URL in your ``runtime_env`` dictionary, you can pass the dictionary into a ``ray.init()`` or ``.options()`` call. Congratulations! You have now hosted a ``runtime_env`` dependency -remotely on GitHub! \ No newline at end of file +remotely on GitHub! diff --git a/doc/source/serve/core-apis.rst b/doc/source/serve/core-apis.rst index b3b853728c89..5c9e3d28562d 100644 --- a/doc/source/serve/core-apis.rst +++ b/doc/source/serve/core-apis.rst @@ -288,7 +288,7 @@ is set. In particular, it's also called when new replicas are created in the future if scale up your deployment later. The `reconfigure` method is also called each time `user_config` is updated. -Dependency Management +Handling Dependencies ===================== Ray Serve supports serving deployments with different (possibly conflicting) diff --git a/doc/source/using-ray.rst b/doc/source/using-ray.rst index 05adb849b212..ec2a0909ae4d 100644 --- a/doc/source/using-ray.rst +++ b/doc/source/using-ray.rst @@ -13,7 +13,7 @@ Finally, we've also included some content on using core Ray APIs with `Tensorflo starting-ray.rst actors.rst namespaces.rst - dependency-management.rst + handling-dependencies.rst async_api.rst concurrency_group_api.rst using-ray-with-gpus.rst diff --git a/doc/source/walkthrough.rst b/doc/source/walkthrough.rst index 24fe605a308e..89f499938704 100644 --- a/doc/source/walkthrough.rst +++ b/doc/source/walkthrough.rst @@ -344,7 +344,7 @@ Below are more examples of resource specifications: .. tip:: Besides compute resources, you can also specify an environment for a task to run in, - which can include Python packages, local files, environment variables, and more--see :ref:`Runtime Environments ` for details. + which can include Python packages, local files, environment variables, and more---see :ref:`Runtime Environments ` for details. Multiple returns ~~~~~~~~~~~~~~~~ diff --git a/doc/source/workflows/management.rst b/doc/source/workflows/management.rst index 5666fa3c9818..774c47365ec7 100644 --- a/doc/source/workflows/management.rst +++ b/doc/source/workflows/management.rst @@ -103,7 +103,7 @@ Besides ``workflow.init()``, the storage URI can also be set via environment var If left unspecified, ``/tmp/ray/workflow_data`` will be used for temporary storage. This default setting *will only work for single-node Ray clusters*. -Dependency Management +Handling Dependencies --------------------- **Note: This feature is not yet implemented.**