Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Serve] More flexible deployments #34278

Closed
brosand opened this issue Apr 11, 2023 · 7 comments
Closed

[Ray Serve] More flexible deployments #34278

brosand opened this issue Apr 11, 2023 · 7 comments
Assignees
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical serve Ray Serve Related Issue

Comments

@brosand
Copy link

brosand commented Apr 11, 2023

Description

In Seldon Core, there is a feature pipelines which enables users to easily and declaratively combine existing deployments: https://docs.seldon.io/projects/seldon-core/en/v2/contents/pipelines/index.html#model-routing-via-tensors. The key feature here is that a deployment can be built that attaches to existing deployments without any being modified, and with the original deployment endpoints remaining open. Is there any way we might see something like this in Ray Serve? I know serve DAGs enable something similar, but they are limited in two key ways:

  1. Once a DAG is deployed, it cannot be modified or linked to another DAG.
  2. Ray Serve DAGs copy models when they are bound more than once, so there is not an easy way to setup more complex graphs.

The feature request would be to enable multiple separate deployments to be linked together (for our use-case this would need to work through kuberay), with each separate deployment being a separate service so that they are exposed separately through grpc.

Use case

A key use case is model chaining. Let's say we are using ray serve to serve an LLM, and want that endpoint available as needed. In addition, we want to chain that model to a diffusers model, but we don't want to create a new deployment off the original LLM, as we want to only manage one LLM. Currently, we would have to build a new ray serve python file from scratch, and it would only expose one endpoint and requests would have to be handled and routed inside, and any grpc requests would have to share protobufs.

@brosand brosand added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 11, 2023
@brosand
Copy link
Author

brosand commented Apr 11, 2023

Similar to #31072 -- also note k8s requirements

@brosand
Copy link
Author

brosand commented Apr 13, 2023

@sihanwang41 maybe relevant to some of the stuff you're working on with multiplexing?

@sihanwang41
Copy link
Contributor

Hi @brosand , this can be done by using pure serve handle. In your case, LLM deployment can hold the handle of stable diffussion deployment. LLM and Stable diffusion can be deployed separately as two applications.

@brosand
Copy link
Author

brosand commented Apr 13, 2023

Hi @brosand , this can be done by using pure serve handle. In your case, LLM deployment can hold the handle of stable diffussion deployment. LLM and Stable diffusion can be deployed separately as two applications.

Thanks for the response @sihanwang41 -- I guess I wasn't thinking about the pure python case. The problem with the pure python case is that in kuberay deployments can't be added post hoc in python, as they would require the image to be completely rebuilt. Is this characterization fair? I guess my feature request would be specific to rayservices

@sihanwang41
Copy link
Contributor

So kuberay multiple applications support is under our plannings, so that you can deploy multiple Deployment (image) without affecting each other.

@kevin85421
Copy link
Member

@sihanwang41 could you triage it?

@kevin85421 kevin85421 removed their assignment Apr 14, 2023
@hora-anyscale hora-anyscale added P2 Important issue, but not time-critical serve Ray Serve Related Issue and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 28, 2023
@akshay-anyscale
Copy link
Contributor

This is resolved in ray-project/kuberay#985

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

6 participants