[Ray Serve] More flexible deployments #34278

brosand · 2023-04-11T19:05:42Z

Description

In Seldon Core, there is a feature pipelines which enables users to easily and declaratively combine existing deployments: https://docs.seldon.io/projects/seldon-core/en/v2/contents/pipelines/index.html#model-routing-via-tensors. The key feature here is that a deployment can be built that attaches to existing deployments without any being modified, and with the original deployment endpoints remaining open. Is there any way we might see something like this in Ray Serve? I know serve DAGs enable something similar, but they are limited in two key ways:

Once a DAG is deployed, it cannot be modified or linked to another DAG.
Ray Serve DAGs copy models when they are bound more than once, so there is not an easy way to setup more complex graphs.

The feature request would be to enable multiple separate deployments to be linked together (for our use-case this would need to work through kuberay), with each separate deployment being a separate service so that they are exposed separately through grpc.

Use case

A key use case is model chaining. Let's say we are using ray serve to serve an LLM, and want that endpoint available as needed. In addition, we want to chain that model to a diffusers model, but we don't want to create a new deployment off the original LLM, as we want to only manage one LLM. Currently, we would have to build a new ray serve python file from scratch, and it would only expose one endpoint and requests would have to be handled and routed inside, and any grpc requests would have to share protobufs.

The text was updated successfully, but these errors were encountered:

brosand · 2023-04-11T20:59:16Z

Similar to #31072 -- also note k8s requirements

brosand · 2023-04-13T16:18:52Z

@sihanwang41 maybe relevant to some of the stuff you're working on with multiplexing?

sihanwang41 · 2023-04-13T16:49:30Z

Hi @brosand , this can be done by using pure serve handle. In your case, LLM deployment can hold the handle of stable diffussion deployment. LLM and Stable diffusion can be deployed separately as two applications.

brosand · 2023-04-13T16:55:32Z

Hi @brosand , this can be done by using pure serve handle. In your case, LLM deployment can hold the handle of stable diffussion deployment. LLM and Stable diffusion can be deployed separately as two applications.

Thanks for the response @sihanwang41 -- I guess I wasn't thinking about the pure python case. The problem with the pure python case is that in kuberay deployments can't be added post hoc in python, as they would require the image to be completely rebuilt. Is this characterization fair? I guess my feature request would be specific to rayservices

sihanwang41 · 2023-04-13T17:52:26Z

So kuberay multiple applications support is under our plannings, so that you can deploy multiple Deployment (image) without affecting each other.

kevin85421 · 2023-04-14T20:37:48Z

@sihanwang41 could you triage it?

akshay-anyscale · 2023-06-27T05:19:11Z

This is resolved in ray-project/kuberay#985

brosand added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 11, 2023

hora-anyscale assigned kevin85421 Apr 14, 2023

kevin85421 removed their assignment Apr 14, 2023

hora-anyscale added P2 Important issue, but not time-critical serve Ray Serve Related Issue and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 28, 2023

akshay-anyscale assigned zcin May 2, 2023

akshay-anyscale closed this as completed Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ray Serve] More flexible deployments #34278

[Ray Serve] More flexible deployments #34278

brosand commented Apr 11, 2023 •

edited

Loading

brosand commented Apr 11, 2023

brosand commented Apr 13, 2023

sihanwang41 commented Apr 13, 2023

brosand commented Apr 13, 2023

sihanwang41 commented Apr 13, 2023

kevin85421 commented Apr 14, 2023

akshay-anyscale commented Jun 27, 2023

[Ray Serve] More flexible deployments #34278

[Ray Serve] More flexible deployments #34278

Comments

brosand commented Apr 11, 2023 • edited Loading

Description

Use case

brosand commented Apr 11, 2023

brosand commented Apr 13, 2023

sihanwang41 commented Apr 13, 2023

brosand commented Apr 13, 2023

sihanwang41 commented Apr 13, 2023

kevin85421 commented Apr 14, 2023

akshay-anyscale commented Jun 27, 2023

brosand commented Apr 11, 2023 •

edited

Loading