fractional GPU: a) Which GPU models are supported, b) How is the behavoir? #5082

andife · 2024-03-20T19:04:32Z

andife
Mar 20, 2024

Hi,
I wonder which GPU models are supported by NVIDIA regarding "fractional GPU" (At https://docs.flyte.org/en/latest/api/flytekit/extras.accelerators.html#predefined-accelerator-constants only a few NVIDIA GPUs are listed? ). Or what does fractional GPU mean in this context? Can't the container consume more GPU memory than specified in the config? Will or could the process be killed if it starts to need more during the process to ensure that it does not jeopardize the other processes on the system?

What determines which GPU models can be used? What about the GTX6000 or a TRX2080TI, for example?

Thank you

@andife · 2024-03-20T19:04:46Z

dosubot[bot]
bot Mar 20, 2024

Hey @andife, great to see you back! Hope you're doing well.

The support for fractional GPU usage in Flyte is not explicitly defined by the GPU models but rather depends on the configuration of your Kubernetes cluster and the NVIDIA device plugin for Kubernetes. The Flyte codebase references a GPU resource name as "nvidia.com/gpu", which is a standard way to request NVIDIA GPU resources in Kubernetes. This suggests that the support for fractional GPU usage would depend on the Kubernetes cluster's configuration and the NVIDIA device plugin for Kubernetes.

Fractional GPU usage in Kubernetes, and by extension in Flyte when configured to use Kubernetes for task execution, typically relies on the support provided by the NVIDIA device plugin for Kubernetes. This plugin allows Kubernetes to schedule GPU resources, but the support for fractional GPU usage (e.g., requesting a portion of a GPU's capabilities) would depend on specific features of the NVIDIA device plugin and the underlying hardware.

If a container exceeds the specified GPU memory during a process, the behavior would depend on how the NVIDIA device plugin and the Kubernetes cluster are configured to handle resource limits and requests. Kubernetes does not enforce GPU memory limits in the same way it does for CPU and RAM. If a process tries to use more GPU memory than is available, it could lead to allocation errors from the CUDA runtime or other GPU-related libraries, potentially causing the process to fail. Kubernetes itself does not monitor or limit GPU memory usage directly; such enforcement would need to be handled at the application level or by using specific GPU monitoring tools.

In terms of specific GPU models like GTX6000 or TRX2080TI, the support would again depend on the NVIDIA device plugin for Kubernetes and the specific features of these GPU models. The Flyte codebase does not provide explicit details on supported GPU models.

For more details, you can refer to the following sources:

I hope this helps clarify your questions. If you have further queries, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

kumare3 · 2024-03-20T21:03:46Z

kumare3
Mar 20, 2024
Maintainer

@andife this is great question and probably needs clarification. Flyte only supports what nvidia device plugin supports, which is MIG based gpu partitioning system. This is memory safe, but is only available for certain gpu devices, IIRC, starting from the A100 family.

Are you asking if you need to use accelerators for the GTX devices? or do you want to fractionalize them. I do not think the gTX family supports MIG.

On the other hand if you want to use the accelerator tag to target to the GTX devices, this absolutely can be done, but will need adding these new constants and providing the relevant configuration to propeller.

3 replies

andife Mar 20, 2024
Author

Hi Ketan,
Thank you for your reply.

Exactly my question is about properly splitting different processes on a single graphics card (with limiting GPU memory usage per process), i.e. fractionating a graphics card. It would be great if a solution could be found for "simpler" cards.

kumare3 Mar 29, 2024
Maintainer

@andife not sure if we can do fractionalization. If you have ideas, that would really benefit the community. Love to hear them.

andife Mar 30, 2024
Author

I read about https://clear.ml/blog/open-source-fractional-gpus-for-everyone-now-available-from-clearml
and asked me, what is the situation with Flyte?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fractional GPU: a) Which GPU models are supported, b) How is the behavoir? #5082

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

About Dosu

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

fractional GPU: a) Which GPU models are supported, b) How is the behavoir? #5082

andife Mar 20, 2024

Replies: 2 comments · 3 replies

dosubot[bot] bot Mar 20, 2024

Sources

About Dosu

kumare3 Mar 20, 2024 Maintainer

andife Mar 20, 2024 Author

kumare3 Mar 29, 2024 Maintainer

andife Mar 30, 2024 Author

andife
Mar 20, 2024

Replies: 2 comments 3 replies

dosubot[bot]
bot Mar 20, 2024

kumare3
Mar 20, 2024
Maintainer

andife Mar 20, 2024
Author

kumare3 Mar 29, 2024
Maintainer

andife Mar 30, 2024
Author