Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it? #769

Closed
Flynn-Zh opened this issue Jun 14, 2024 · 4 comments

Comments

@Flynn-Zh
Copy link

there are two gpu device on the kubenetes node,
the timeSlicing.replicas is two,
the nvidia.com/gpu of large langurage model is two,
the nvidia.com/gpu of other models are one,
but the pod of large langurage model has two gpu device

@klueska
Copy link
Contributor

klueska commented Jun 14, 2024

You can't do this with the standard device plugin. You will need to wait until DRA is available:
https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo/edit#heading=h.bxuci8gx6hna

Copy link

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2024
@Flynn-Zh
Copy link
Author

the DRA cannot be used in production environments.
Can we modify the code of GPU allocation strategy to implement it?
Can anyone give me some help?

@Flynn-Zh
Copy link
Author

Flynn-Zh commented Sep 24, 2024

the DRA cannot be used in production environments. Can we modify the code of GPU allocation strategy to implement it? Can anyone give me some help?

example: used timeslicing,replicas set 2, the pod limits set 2,assign a gpu,set the other two pods to 1 and assign them to the same gpu

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants