-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Force tasks to be scheduled only on local node #5722
Comments
We have had the need for workers with a critical need of data locality similar to what you describe. We have addressed this kind of thing by launching an actor on each node and that actor has a list of work specific to that node (ie the data is on that node). Each worker task checks what node it is on and asks for work from the actor that resides on its node. Does that make sense? I can elaborate more if needed. |
RLlib has a helper function called "create_colocated" to launch actors on the same node; you could do the same for your application. |
This should be doable with custom resources. See https://ray.readthedocs.io/en/latest/configure.html#cluster-resources. Does this work for your use case? |
Hey all, @virtualluke @robertnishihara
Is So yes, in principle this would work. I think in the long run it would be cool to have some sort of flag in ray.remote() that allows to specifiy whether the actor should only be run node-local. Something like
This approach would not require to start each node differently with a different special ressource attached. In terms of auto-scalability it would be easier. Does this make sense? |
Also very interested in this question (the most effective way to do this) - seems relevant particularly for use cases with a slow ethernet and large data transfer, it is hard to get good performance without forcing local execution on parts of a system. |
Here's a walkthrough of how one might do this: # Define a remote func
@ray.remote
def remote_func(args):
print("hi")
# Obtain the local ip address
local_hostname = ray.services.get_node_ip_address()
assert isinstance(local_hostname, str)
# Ray has a predefined "node id" resource for locality placement
node_id = f"node:{local_hostname}"
# Check to make sure the node id resource exists
assert node_id in ray.cluster_resources()
# Create a remote function with the given resource label attached
local_remote_func = remote_func.options(resources={node_id: 0.01})
# Invoke the remote function
ray.get(local_remote_func.remote())
# prints "hi" from the local node |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
Describe the problem
I would like to use ray for parallized execution, especially to avoid multiprocessing.
Eg, when I open a list of compressed images I would like to run this concurrently. However, it wouldn't make too much sense to delegate this task to workers not on the same node.
Is there a way to force scheduling only on the local node?
The text was updated successfully, but these errors were encountered: