I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it？ #769

Flynn-Zh · 2024-06-14T09:23:24Z

there are two gpu device on the kubenetes node,
the timeSlicing.replicas is two,
the nvidia.com/gpu of large langurage model is two,
the nvidia.com/gpu of other models are one,
but the pod of large langurage model has two gpu device

klueska · 2024-06-14T11:48:53Z

You can't do this with the standard device plugin. You will need to wait until DRA is available:
https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo/edit#heading=h.bxuci8gx6hna

github-actions · 2024-09-13T04:27:16Z

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

Flynn-Zh · 2024-09-24T07:43:37Z

the DRA cannot be used in production environments.
Can we modify the code of GPU allocation strategy to implement it?
Can anyone give me some help?

Flynn-Zh · 2024-09-24T09:53:21Z

the DRA cannot be used in production environments. Can we modify the code of GPU allocation strategy to implement it? Can anyone give me some help?

example: used timeslicing，replicas set 2, the pod limits set 2，assign a gpu，set the other two pods to 1 and assign them to the same gpu

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2024

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2024

Flynn-Zh closed this as completed Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it？ #769

I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it？ #769

Flynn-Zh commented Jun 14, 2024

klueska commented Jun 14, 2024

github-actions bot commented Sep 13, 2024

Flynn-Zh commented Sep 24, 2024

Flynn-Zh commented Sep 24, 2024 •

edited

Loading

I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it？ #769

I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it？ #769

Comments

Flynn-Zh commented Jun 14, 2024

klueska commented Jun 14, 2024

github-actions bot commented Sep 13, 2024

Flynn-Zh commented Sep 24, 2024

Flynn-Zh commented Sep 24, 2024 • edited Loading

Flynn-Zh commented Sep 24, 2024 •

edited

Loading