-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support scheduling_hint=SPREAD|COLOCATE for tasks and actors #18524
Comments
For colocate, does this api allow users to colocate any two+ tasks easily? I think we also have this requirement, so far we just hacked with node id as customized res. Good to see this proposal! |
In the above proposal, you could force colocation of two tasks if the second task is launched by the first task. If the two tasks are launched independently you can already force colocation using a placement group, hope this helps. |
This would also generalize to actors placement right? |
Yes, we could support it for both tasks and actors, though the implementation may differ slightly for actors. |
For |
For the Datasets use case they're all launched together (by the driver). I think we'd want the hint to work well in both scenarios, are there some implications per scenario you're thinking of? |
For the |
COLOCATE seems to force colocation, whereas "hint" sounds like a best effort thing. |
Bump, another OSS user ran into this with Datasets, where read tasks (and therefore downstream map tasks) are packing onto a single node, causing poor performance and cluster instability. |
func.options(scheduling_hint="SPREAD").remote() I'm confused about the semantic of this. Does this mean that |
The scheduler will do its best to spread them equally across different nodes, similar to SPREAD in placement groups. No guarantees though. They are independent. |
hmm... It sounds like SOFT SPREAD in placement group but without Gang Scheduler? |
@clay4444 Yea, behave similar to soft spread in placement group. |
Several use cases benefit from finer-grained control over scheduling, and cannot benefit from automatic locality-aware scheduling nor placement groups.
Proposal:
Data reading tasks: These tasks have no input, but produce large amounts of output. Ideally Ray would spread these tasks across the cluster, but currently there is no way to do so. This causes data imbalance in ML ingest and Dask-on-Ray workloads. Currently Dask-on-Ray recommends a hidden scheduler flag for this: https://docs.ray.io/en/latest/data/dask-on-ray.html#best-practice-for-large-scale-workloads
This is also a blocker for scalable ML ingest without the "resource prefix" hack, since large datasets cause memory imbalance across the cluster without spreading.
Helper tasks relying on local resources: Suppose a task a file locally, but wants to launch sub-tasks for parallelism. There is no current way to do this except by relying on hacky node id resources. Another example is the driver forking a "main" task on the head node for easy debugging.
Related issues: #18465, #5722
The text was updated successfully, but these errors were encountered: