Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime env] [Doc] Add concepts and basic workflows #20222
[runtime env] [Doc] Add concepts and basic workflows #20222
Changes from 2 commits
dd957c3
26676de
ca25c47
c02bbbb
7e1d2b8
1cd296c
6f06a71
4ffeaa2
e567539
4a56ece
457e131
761d52d
c15564b
caaa7d6
3288048
44ab154
cf9cb0e
1d4e548
703beff
61c0e12
15e8809
95c016c
96de58f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little confused with this sentence. We don't need to manually SSH into your cluster for the existing solution now right? (it is handled by the setup commands)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant to include setup commands in "Ray cluster launcher commands", which this doc describes as an existing feature. Let me make this more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should start problem the highest problem to low level options here.
Maybe we can describe it in this way instead?
And then we can say
There are 2 ways to set up your Ray environment (e.g., files, environment variables, python package dependencies, system dependencies and etc.)
Set up the same environment across machines. This is the most common way to configure environments in Ray. You can use autoscaler's setup commands or docker container deployment. Blah blah... All of Ray tasks and actors will use the same environment as all machines are configured with the same environment. Pro is X con is Y (e.g., all jobs have to use the same environment.)
Set up per job/task/actor environment. This is useful when X (e.g., Serve or multi tenant cluster). In this case you can use runtime environment API blah blah.. Pro is X con is Y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need a section regarding how to setup environment when Ray client is used, and runtime environment can be used as a good solution as well (or you should mention the local machine / remote cluster should have the same environment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is useful to have in the docs. Maybe we can put them in a top-level page under "Multi-Node Ray" which then links to this runtime env page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkooo567 I know we still need more here but I'm not quite sure what to put, do you have any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should ask the autoscaler team to fill it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think common problems are
Manual:
Container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true ray client is a recommended way? Afaik, it is a lot less stable to use ray client now than directly submitting the driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got that from here https://docs.ray.io/en/latest/cluster/guide.html#deploying-an-application "The recommended way of connecting to a Ray cluster is to use the ray.init("ray://:") API and connect via the Ray Client."
I'm not sure which is more stable, but you're right that we should be clear about which one is recommended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is due to my lack of understanding in runtime environment. But if you run the driver in a head node, isn't it going to be the same? (like specifying runtime env on the job config)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's the same, but you would need to manually set up the files and dependencies on all the worker nodes. I'll make this more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe having a separate section to explain what APIs are supported for per job or per actor / tasks? Like;
Supported APIs:
Jobs
Per tasks/actors