-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray Serve] More flexible deployments #34278
Comments
Similar to #31072 -- also note k8s requirements |
@sihanwang41 maybe relevant to some of the stuff you're working on with multiplexing? |
Hi @brosand , this can be done by using pure serve handle. In your case, LLM deployment can hold the handle of stable diffussion deployment. LLM and Stable diffusion can be deployed separately as two applications. |
Thanks for the response @sihanwang41 -- I guess I wasn't thinking about the pure python case. The problem with the pure python case is that in kuberay deployments can't be added post hoc in python, as they would require the image to be completely rebuilt. Is this characterization fair? I guess my feature request would be specific to |
So kuberay multiple applications support is under our plannings, so that you can deploy multiple Deployment (image) without affecting each other. |
@sihanwang41 could you triage it? |
This is resolved in ray-project/kuberay#985 |
Description
In Seldon Core, there is a feature
pipelines
which enables users to easily and declaratively combine existing deployments: https://docs.seldon.io/projects/seldon-core/en/v2/contents/pipelines/index.html#model-routing-via-tensors. The key feature here is that a deployment can be built that attaches to existing deployments without any being modified, and with the original deployment endpoints remaining open. Is there any way we might see something like this in Ray Serve? I know serve DAGs enable something similar, but they are limited in two key ways:The feature request would be to enable multiple separate deployments to be linked together (for our use-case this would need to work through kuberay), with each separate deployment being a separate
service
so that they are exposed separately through grpc.Use case
A key use case is model chaining. Let's say we are using ray serve to serve an LLM, and want that endpoint available as needed. In addition, we want to chain that model to a diffusers model, but we don't want to create a new deployment off the original LLM, as we want to only manage one LLM. Currently, we would have to build a new ray serve python file from scratch, and it would only expose one endpoint and requests would have to be handled and routed inside, and any grpc requests would have to share protobufs.
The text was updated successfully, but these errors were encountered: