Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the best way to control which GPUs a worker can use? #303

Closed
robertnishihara opened this issue Feb 21, 2017 · 5 comments
Closed

What's the best way to control which GPUs a worker can use? #303

robertnishihara opened this issue Feb 21, 2017 · 5 comments
Labels
question Just a question :)

Comments

@robertnishihara
Copy link
Collaborator

Right now, we allow tasks to specify that they require GPUs by including it in the decorator, e.g.,

@ray.remote(num_gpus=2)
def f():
  ...

#302 introduces the same syntax for actors, e.g.,

@ray.actor(num_gpus=3)
class Foo(object):
  ...

So how does the function f actually know which GPUs to use?

With CPUs, the OS will try to balance things between CPUs, so it's less critical to get it right. That said, the local scheduler (or whoever) could set the affinity of different worker processes for different CPUs to control which CPUs each worker can use. Is there an analogue of all this for GPUs?

GPUs seem different from CPUs. The burden of choosing which GPU to use is often placed on the programmer, not on the OS. For example, the standard way of controlling which GPUs TensorFlow uses is to set the environment variable CUDA_VISIBLE_DEVICES, e.g., something like CUDA_VISIBLE_DEVICES=0,3,4 before running tf.Session(). Once you create a session, TensorFlow will reserve a bunch of memory on all visible GPUs. I'm not sure how specific the concept of using GPU device IDs is to TensorFlow or if it is general.

We can expose a method ray.get_gpu_ids() that could be called inside any task or any actor and would return the IDs (e.g., [0, 3, 4]) of the GPUs that that process is allowed to use. This assumes that environment variables can be set from within Python. In the case of TensorFlow, that works, e.g., we can do os.environ["CUDA_VISIBLE_DEVICES"] = ",".join([str(i) for i in ray.get_gpu_ids()]) or something like that from Python (e.g., within an actor constructor) as long as we do it before we run tf.Session(). But you could imagine a scenario or a different library where the environment variable has to be set BEFORE the worker process is created (or before the library is imported). In that case, there are other options (e.g., having the local scheduler set the environment variable), but they all seem awful (actually, I ran into such a situation recently where I had to set an environment variable like DISPLAY=:99 before importing a library because otherwise the library crashed when looking for an X server and brought down the worker).

@robertnishihara robertnishihara added the question Just a question :) label Feb 21, 2017
@robertnishihara
Copy link
Collaborator Author

One solution for now is to expose a method like ray.get_gpu_ids() or ray.get_env()["GPU_IDS"] within tasks and within actor methods.

For now, we can assume that users will handle things like setting the environment variable CUDA_VISIBLE_DEVICES themselves.

@GoingMyWay
Copy link

Hey, it seems that it is still a problem. Any suggestions?

@robertnishihara
Copy link
Collaborator Author

@GoingMyWay can you share more details about the problem you're seeing? A reproducible script would be ideal.

There are more details about using Ray with GPUs here https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html.

@DanielWicz
Copy link

@robertnishihara
What if you use AMD's GPUs ?

@amztc34283
Copy link

amztc34283 commented Oct 30, 2023

I think this is still a problem and it is impacting the performance of Ray Tune.

Like @robertnishihara mentioned above, the gpus exposed to the remote function can be overridden by setting the CUDA_VISIBLE_DEVICES environment variable. However, in the case of Ray Tune, we specify the number of gpus exposed per worker in advance, and each worker basically can not have data sharing with more than the number of gpus per worker specified; thus limiting the benefit of data parallelism within a single node.

For example, I am running 8 workers (each with num_gpus=1) in a single node machine with 8 gpus. Each worker will not utilize all the gpus available to it because num_gpus=1 isolates the available gpu for each worker. The ideal case is to allow each worker to use all gpus for the sake of data parallelism.

One possible solution is to blow up the logical gpu count by the factor of parallelism (logical gpu count = physical gpu count * factor of parallelism) you want to run so that each worker can run with num_gpus=(ideal parallelism a.k.a number of physical gpus) and we can further specify the machines with CUDA_VISIBLE_DEVICES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Just a question :)
Projects
None yet
Development

No branches or pull requests

4 participants