Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add OPEA deployment design #10

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions community/rfcs/24-05-17-OPEA-001-Deployment-Design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
**Author**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow example rfc template in rfc_template.txt (same directory)

Suggested change
**Author**
# 24-05-17-OPEA-001-Deployment-Design
## Authors


[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), [mkbhanda](https://github.com/mkbhanda), [irisdingbj](https://github.com/irisdingbj), [KfreeZ](https://github.com/kfreez), [zhlsunshine](https://github.com/zhlsunshine) **Edit Here to add your id**

**Status**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Status**
## Status


Under Review

**Objective**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Objective**
## Objective


Have a clear and good design for users to deploy their own GenAI applications on docker or Kubernetes environment.


**Motivation**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Motivation**
## Motivation


This RFC presents the OPEA deployment-related design for community discussion.

**Design Proposal**

Refer to this [OPEA overall architecture design document](24-05-16-OPEA-001-Overall-Design.md).

The proposed OPEA deployment workflow is

<a target="_blank" href="opea_deploy_workflow.png">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put all the images into the assets folder, and use normal markdown syntax to reference them instead of raw HTML

Suggested change
<a target="_blank" href="opea_deploy_workflow.png">
![OPEA deploy workflow](assets/opea_deploy_workflow.png)

<img src="opea_deploy_workflow.png" alt="Deployment" width=480 height=310>
</a>

We provide two interfaces for deploying GenAI applications:

1. Docker deployment by python

Here is a python example for constructing a RAG (Retrieval-Augmented Generation) application:

```python
from comps import MicroService, ServiceOrchestrator
class ChatQnAService:
def __init__(self, port=8080):
self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna")
def add_remote_service(self):
embedding = MicroService(
name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True
)
retriever = MicroService(
name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True
)
rerank = MicroService(
name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True
)
llm = MicroService(
name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True
)
self.service_builder.add(embedding).add(retriever).add(rerank).add(llm)
self.service_builder.flow_to(embedding, retriever)
self.service_builder.flow_to(retriever, rerank)
self.service_builder.flow_to(rerank, llm)

```

2. Kubernetes deployment using YAML

Here is a YAML example for constructing a RAG (Retrieval-Augmented Generation) application:

```yaml
opea_micro_services:
embedding:
endpoint: /v1/embeddings
port: 6000
retrieval:
endpoint: /v1/retrieval
port: 7000
reranking:
endpoint: /v1/reranking
port: 8000
llm:
endpoint: /v1/chat/completions
port: 9000

opea_mega_service:
port: 8080
mega_flow:
- embedding >> retrieval >> reranking >> llm

```
This YAML will be acting as a unified language interface for end user to define their GenAI Application.

When deploying the GenAI application to Kubernetes environment, you should define and convert the YAML configuration file to an appropriate [docker compose](https://docs.docker.com/compose/), or [GenAI Microservice Connector-(GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector) custom resource file.

Note: A convert tool will be provided for OPEA to convert unified language interface to docker componse or GMC.

A sample GMC [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) is like below:

```yaml
apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
name: chatqna
namespace: gmcsample
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Embedding
internalService:
serviceName: embedding-service
config:
endpoint: /v1/embeddings
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/tei
modelId: BAAI/bge-base-en-v1.5
endpoint: /embed
isDownstreamService: true
- name: Retriever
data: $response
internalService:
serviceName: retriever-redis-server
config:
RedisUrl: redis-vector-db
IndexName: rag-redis
tei_endpoint: tei-embedding-service
endpoint: /v1/retrieval
- name: VectorDB
internalService:
serviceName: redis-vector-db
isDownstreamService: true
- name: Reranking
data: $response
internalService:
serviceName: reranking-service
config:
tei_reranking_endpoint: tei-reranking-service
gmcTokenSecret: gmc-tokens
endpoint: /v1/reranking
- name: TeiReranking
internalService:
serviceName: tei-reranking-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/rerank
modelId: BAAI/bge-reranker-large
endpoint: /rerank
isDownstreamService: true
- name: Llm
data: $response
internalService:
serviceName: llm-service
config:
tgi_endpoint: tgi-service
gmcTokenSecret: gmc-tokens
endpoint: /v1/chat/completions
- name: Tgi
internalService:
serviceName: tgi-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/tgi
modelId: Intel/neural-chat-7b-v3-3
endpoint: /generate
isDownstreamService: true
```
There should be an available `gmconnectors.gmc.opea.io` CR named `chatqna` under the namespace `gmcsample`, showing below:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add expansion for CR (Custom resource) and a link to documentation.

@irisdingbj were we thinking eventually of a tool to ease creating this yaml down the road. if yes, we could mention the same here and stick a feature request issue to address later on.


```bash
$kubectl get gmconnectors.gmc.opea.io -n gmcsample
NAME URL READY AGE
chatqa http://router-service.gmcsample.svc.cluster.local:8080 Success 3m
```

And the user can access the application pipeline via the value of `URL` field in above.

The whole deployment process illustrated by the diagram below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/illustrated/is illustrated/


<a target="_blank" href="opea_deploy_process.png">
<img src="opea_deploy_process_v1.png" alt="Deployment Process" width=480 height=310>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put all the images into the assets folder, and use normal markdown syntax to reference them instead of raw HTML

Suggested change
<img src="opea_deploy_process_v1.png" alt="Deployment Process" width=480 height=310>
![deployment process](assets/opea_deploy_process_v1.png)

You've also got extra images for v0 and v2 that aren't referenced, so delete them when you can.

</a>


**Alternatives Considered**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Alternatives Considered**
## Alternatives Considered


[Kserve](https://github.com/kserve/kserve): has provided [InferenceGraph](https://kserve.github.io/website/0.9/modelserving/inference_graph/), however it only supports inference service and lack of deployment support.


**Compatibility**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Compatibility**
## Compatibility


n/a

**Miscs**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Miscs**
## Miscs


- TODO List:

- [ ] one click deployment on AWS, GCP, Azure cloud
- [ ] static cloud resource allocator vs dynamic cloud resource allocator
- [ ] k8s GMC with istio

Binary file added community/rfcs/opea_deploy_process_v0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added community/rfcs/opea_deploy_process_v1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added community/rfcs/opea_deploy_process_v2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added community/rfcs/opea_deploy_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.