Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime env] [Doc] Add concepts and basic workflows #20222

Merged
merged 23 commits into from
Nov 19, 2021

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Nov 10, 2021

Why are these changes needed?

Renaming the file made the diff hard to check, It might be easier to review by just scanning through the Buildkite docs build. (Or you can just check this commit 26676de)

TODO: Move new code samples to files that are tested in CI

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@richardliaw
Copy link
Contributor

Hey @architkulkarni can you add a PR description?

@richardliaw
Copy link
Contributor

also can I push directly?

@architkulkarni
Copy link
Contributor Author

Adding one now, and yeah feel free to push

- for running jobs, tasks and actors with different dependencies, all on the same Ray cluster.

**Option 2.** Alternatively, you can prepare your Ray cluster's environment when your cluster nodes start up, and modify it later from the command line.
Packages can be installed using ``setup_commands`` in the Ray Cluster configuration file (:ref:`docs<cluster-configuration-setup-commands>`) and files can be pushed to the cluster using ``ray rsync_up`` (:ref:`docs<ray-rsync>`).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkooo567 I know we still need more here but I'm not quite sure what to put, do you have any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ask the autoscaler team to fill it up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think common problems are

Manual:

  • Link to autoscaler section that describes how to set up deps
  • env variables (setup commands)
  • System deps (setup commands)
  • Files (rsync up or manually copy and paste. Make sure they are all synced)
  • Python packages (setup commands)

Container

  • Same things (link to container deployment)

=====================

Your Ray application may depend on environment variables, files, and Python packages.
Ray provides two features to specify these dependencies when working with a remote cluster: Runtime environments, and the Ray cluster launcher commands
Copy link
Contributor

@richardliaw richardliaw Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ray provides two features to specify these dependencies when working with a remote cluster: Runtime environments, and the Ray cluster launcher commands
Ray provides two features to specify these dependencies when working with a Ray cluster: :ref:`runtime Environments<runtime-environments>`, and the :ref:`Ray cluster launcher commands <INSERT THE RIGHT LINK>`.


Your Ray application may depend on environment variables, files, and Python packages.
Ray provides two features to specify these dependencies when working with a remote cluster: Runtime environments, and the Ray cluster launcher commands
With these features, you no longer need to manually SSH into your cluster and set up your environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little confused with this sentence. We don't need to manually SSH into your cluster for the existing solution now right? (it is handled by the setup commands)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant to include setup commands in "Ray cluster launcher commands", which this doc describes as an existing feature. Let me make this more clear

Your Ray application may depend on environment variables, files, and Python packages.
Ray provides two features to specify these dependencies when working with a remote cluster: Runtime environments, and the Ray cluster launcher commands
With these features, you no longer need to manually SSH into your cluster and set up your environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should start problem the highest problem to low level options here.

Maybe we can describe it in this way instead?

  1. What's the environment in Ray?
  2. Why environment matters in Ray?

And then we can say

There are 2 ways to set up your Ray environment (e.g., files, environment variables, python package dependencies, system dependencies and etc.)

  1. Set up the same environment across machines. This is the most common way to configure environments in Ray. You can use autoscaler's setup commands or docker container deployment. Blah blah... All of Ray tasks and actors will use the same environment as all machines are configured with the same environment. Pro is X con is Y (e.g., all jobs have to use the same environment.)

  2. Set up per job/task/actor environment. This is useful when X (e.g., Serve or multi tenant cluster). In this case you can use runtime environment API blah blah.. Pro is X con is Y.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need a section regarding how to setup environment when Ray client is used, and runtime environment can be used as a good solution as well (or you should mention the local machine / remote cluster should have the same environment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is useful to have in the docs. Maybe we can put them in a top-level page under "Multi-Node Ray" which then links to this runtime env page.

- for running jobs, tasks and actors with different dependencies, all on the same Ray cluster.

**Option 2.** Alternatively, you can prepare your Ray cluster's environment when your cluster nodes start up, and modify it later from the command line.
Packages can be installed using ``setup_commands`` in the Ray Cluster configuration file (:ref:`docs<cluster-configuration-setup-commands>`) and files can be pushed to the cluster using ``ray rsync_up`` (:ref:`docs<ray-rsync>`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think common problems are

Manual:

  • Link to autoscaler section that describes how to set up deps
  • env variables (setup commands)
  • System deps (setup commands)
  • Files (rsync up or manually copy and paste. Make sure they are all synced)
  • Python packages (setup commands)

Container

  • Same things (link to container deployment)

doc/source/handling-dependencies.rst Show resolved Hide resolved
Concepts
--------

- **Local machine** and **Cluster**. The recommended way to connect to a remote Ray cluster is to use :ref:`Ray Client<ray-client>`, and we will call the machine running Ray Client your *local machine*. Note: you can also start a single-node Ray cluster on your local machine---in this case your Ray cluster is not really “remote”, but any comments in this documentation referring to a “remote cluster” will also apply to this setup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true ray client is a recommended way? Afaik, it is a lot less stable to use ray client now than directly submitting the driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that from here https://docs.ray.io/en/latest/cluster/guide.html#deploying-an-application "The recommended way of connecting to a Ray cluster is to use the ray.init("ray://:") API and connect via the Ray Client."

I'm not sure which is more stable, but you're right that we should be clear about which one is recommended

doc/source/handling-dependencies.rst Outdated Show resolved Hide resolved

- ``my_module # Assumes my_module has already been imported, e.g. via 'import my_module'``

Note: Note: Setting options (1) and (3) per-task or per-actor is currently unsupported.
Copy link
Contributor

@rkooo567 rkooo567 Nov 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe having a separate section to explain what APIs are supported for per job or per actor / tasks? Like;

Supported APIs:

Jobs

  • working dir
  • conda env
  • pymodule...

Per tasks/actors

  • conda env

doc/source/handling-dependencies.rst Outdated Show resolved Hide resolved
doc/source/handling-dependencies.rst Outdated Show resolved Hide resolved
@@ -13,7 +13,7 @@ Finally, we've also included some content on using core Ray APIs with `Tensorflo
starting-ray.rst
actors.rst
namespaces.rst
dependency-management.rst
handling-dependencies.rst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation of the name change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was suggested here #19863 (comment) @richardliaw is it because "Dependency Management" already has a meaning that's too specific?

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 15, 2021
Signed-off-by: Richard Liaw <[email protected]>
@architkulkarni architkulkarni changed the title [runtime env] [Doc] Add concepts, py_modules, and two basic workflows [runtime env] [Doc] Add concepts and basic workflows Nov 19, 2021
@architkulkarni
Copy link
Contributor Author

I think the only remaining open question is how much to include about the cluster launcher approach (setup_commands, rsync, directly submitting driver script with address=auto, etc.). There seem to be different opinions here and it probably depends on what we want to promote as a best practice.

The current iteration of the PR doesn't mention the cluster launcher at all, but links to the Runtime Environments page from within "Multi-Node Ray > Ray Deployment Guide". I added some words in the cluster launcher section about environment variables and package installation.

@richardliaw
Copy link
Contributor

richardliaw commented Nov 19, 2021 via email

@ericl ericl merged commit 42085fd into ray-project:master Nov 19, 2021
fishbone pushed a commit that referenced this pull request Nov 20, 2021
Address followup comments from #19863
- Add short "Concepts" section
- Add more section headings to break up the text
- Add "Workflow: Local Files" example
- Add "Workflow: Library development" example
wuisawesome pushed a commit that referenced this pull request Nov 20, 2021
Address followup comments from #19863
- Add short "Concepts" section
- Add more section headings to break up the text
- Add "Workflow: Local Files" example
- Add "Workflow: Library development" example
wuisawesome pushed a commit that referenced this pull request Nov 21, 2021
Address followup comments from #19863
- Add short "Concepts" section
- Add more section headings to break up the text
- Add "Workflow: Local Files" example
- Add "Workflow: Library development" example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants