You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it?
#769
Closed
Flynn-Zh opened this issue
Jun 14, 2024
· 4 comments
there are two gpu device on the kubenetes node,
the timeSlicing.replicas is two,
the nvidia.com/gpu of large langurage model is two,
the nvidia.com/gpu of other models are one,
but the pod of large langurage model has two gpu device
The text was updated successfully, but these errors were encountered:
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
there are two gpu device on the kubenetes node,
the timeSlicing.replicas is two,
the nvidia.com/gpu of large langurage model is two,
the nvidia.com/gpu of other models are one,
but the pod of large langurage model has two gpu device
The text was updated successfully, but these errors were encountered: