-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add OPEA deployment design #10
base: main
Are you sure you want to change the base?
Changes from all commits
a8b4a9c
a0a6e03
028989a
823aaa0
7c48ab1
86a9e23
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,203 @@ | ||||||
**Author** | ||||||
|
||||||
[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), [mkbhanda](https://github.com/mkbhanda), [irisdingbj](https://github.com/irisdingbj), [KfreeZ](https://github.com/kfreez), [zhlsunshine](https://github.com/zhlsunshine) **Edit Here to add your id** | ||||||
|
||||||
**Status** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Under Review | ||||||
|
||||||
**Objective** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Have a clear and good design for users to deploy their own GenAI applications on docker or Kubernetes environment. | ||||||
|
||||||
|
||||||
**Motivation** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
This RFC presents the OPEA deployment-related design for community discussion. | ||||||
|
||||||
**Design Proposal** | ||||||
|
||||||
Refer to this [OPEA overall architecture design document](24-05-16-OPEA-001-Overall-Design.md). | ||||||
|
||||||
The proposed OPEA deployment workflow is | ||||||
|
||||||
<a target="_blank" href="opea_deploy_workflow.png"> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put all the images into the assets folder, and use normal markdown syntax to reference them instead of raw HTML
Suggested change
|
||||||
<img src="opea_deploy_workflow.png" alt="Deployment" width=480 height=310> | ||||||
</a> | ||||||
|
||||||
We provide two interfaces for deploying GenAI applications: | ||||||
|
||||||
1. Docker deployment by python | ||||||
|
||||||
Here is a python example for constructing a RAG (Retrieval-Augmented Generation) application: | ||||||
|
||||||
```python | ||||||
from comps import MicroService, ServiceOrchestrator | ||||||
class ChatQnAService: | ||||||
def __init__(self, port=8080): | ||||||
self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna") | ||||||
def add_remote_service(self): | ||||||
embedding = MicroService( | ||||||
name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True | ||||||
) | ||||||
retriever = MicroService( | ||||||
name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True | ||||||
) | ||||||
rerank = MicroService( | ||||||
name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True | ||||||
) | ||||||
llm = MicroService( | ||||||
name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True | ||||||
) | ||||||
self.service_builder.add(embedding).add(retriever).add(rerank).add(llm) | ||||||
self.service_builder.flow_to(embedding, retriever) | ||||||
self.service_builder.flow_to(retriever, rerank) | ||||||
self.service_builder.flow_to(rerank, llm) | ||||||
|
||||||
``` | ||||||
|
||||||
2. Kubernetes deployment using YAML | ||||||
|
||||||
Here is a YAML example for constructing a RAG (Retrieval-Augmented Generation) application: | ||||||
|
||||||
```yaml | ||||||
opea_micro_services: | ||||||
embedding: | ||||||
endpoint: /v1/embeddings | ||||||
port: 6000 | ||||||
retrieval: | ||||||
endpoint: /v1/retrieval | ||||||
port: 7000 | ||||||
reranking: | ||||||
endpoint: /v1/reranking | ||||||
port: 8000 | ||||||
llm: | ||||||
endpoint: /v1/chat/completions | ||||||
port: 9000 | ||||||
|
||||||
opea_mega_service: | ||||||
port: 8080 | ||||||
mega_flow: | ||||||
- embedding >> retrieval >> reranking >> llm | ||||||
|
||||||
``` | ||||||
This YAML will be acting as a unified language interface for end user to define their GenAI Application. | ||||||
|
||||||
When deploying the GenAI application to Kubernetes environment, you should define and convert the YAML configuration file to an appropriate [docker compose](https://docs.docker.com/compose/), or [GenAI Microservice Connector-(GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector) custom resource file. | ||||||
|
||||||
Note: A convert tool will be provided for OPEA to convert unified language interface to docker componse or GMC. | ||||||
|
||||||
A sample GMC [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) is like below: | ||||||
|
||||||
```yaml | ||||||
apiVersion: gmc.opea.io/v1alpha3 | ||||||
kind: GMConnector | ||||||
metadata: | ||||||
labels: | ||||||
app.kubernetes.io/name: gmconnector | ||||||
name: chatqna | ||||||
namespace: gmcsample | ||||||
spec: | ||||||
routerConfig: | ||||||
name: router | ||||||
serviceName: router-service | ||||||
nodes: | ||||||
root: | ||||||
routerType: Sequence | ||||||
steps: | ||||||
- name: Embedding | ||||||
internalService: | ||||||
serviceName: embedding-service | ||||||
config: | ||||||
endpoint: /v1/embeddings | ||||||
- name: TeiEmbedding | ||||||
internalService: | ||||||
serviceName: tei-embedding-service | ||||||
config: | ||||||
gmcTokenSecret: gmc-tokens | ||||||
hostPath: /root/GMC/data/tei | ||||||
modelId: BAAI/bge-base-en-v1.5 | ||||||
endpoint: /embed | ||||||
isDownstreamService: true | ||||||
- name: Retriever | ||||||
data: $response | ||||||
internalService: | ||||||
serviceName: retriever-redis-server | ||||||
config: | ||||||
RedisUrl: redis-vector-db | ||||||
IndexName: rag-redis | ||||||
tei_endpoint: tei-embedding-service | ||||||
endpoint: /v1/retrieval | ||||||
- name: VectorDB | ||||||
internalService: | ||||||
serviceName: redis-vector-db | ||||||
isDownstreamService: true | ||||||
- name: Reranking | ||||||
data: $response | ||||||
internalService: | ||||||
serviceName: reranking-service | ||||||
config: | ||||||
tei_reranking_endpoint: tei-reranking-service | ||||||
gmcTokenSecret: gmc-tokens | ||||||
endpoint: /v1/reranking | ||||||
- name: TeiReranking | ||||||
internalService: | ||||||
serviceName: tei-reranking-service | ||||||
config: | ||||||
gmcTokenSecret: gmc-tokens | ||||||
hostPath: /root/GMC/data/rerank | ||||||
modelId: BAAI/bge-reranker-large | ||||||
endpoint: /rerank | ||||||
isDownstreamService: true | ||||||
- name: Llm | ||||||
data: $response | ||||||
internalService: | ||||||
serviceName: llm-service | ||||||
config: | ||||||
tgi_endpoint: tgi-service | ||||||
gmcTokenSecret: gmc-tokens | ||||||
endpoint: /v1/chat/completions | ||||||
- name: Tgi | ||||||
internalService: | ||||||
serviceName: tgi-service | ||||||
config: | ||||||
gmcTokenSecret: gmc-tokens | ||||||
hostPath: /root/GMC/data/tgi | ||||||
modelId: Intel/neural-chat-7b-v3-3 | ||||||
endpoint: /generate | ||||||
isDownstreamService: true | ||||||
``` | ||||||
There should be an available `gmconnectors.gmc.opea.io` CR named `chatqna` under the namespace `gmcsample`, showing below: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add expansion for CR (Custom resource) and a link to documentation. @irisdingbj were we thinking eventually of a tool to ease creating this yaml down the road. if yes, we could mention the same here and stick a feature request issue to address later on. |
||||||
|
||||||
```bash | ||||||
$kubectl get gmconnectors.gmc.opea.io -n gmcsample | ||||||
NAME URL READY AGE | ||||||
chatqa http://router-service.gmcsample.svc.cluster.local:8080 Success 3m | ||||||
``` | ||||||
|
||||||
And the user can access the application pipeline via the value of `URL` field in above. | ||||||
|
||||||
The whole deployment process illustrated by the diagram below. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/illustrated/is illustrated/ |
||||||
|
||||||
<a target="_blank" href="opea_deploy_process.png"> | ||||||
<img src="opea_deploy_process_v1.png" alt="Deployment Process" width=480 height=310> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put all the images into the assets folder, and use normal markdown syntax to reference them instead of raw HTML
Suggested change
You've also got extra images for v0 and v2 that aren't referenced, so delete them when you can. |
||||||
</a> | ||||||
|
||||||
|
||||||
**Alternatives Considered** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
[Kserve](https://github.com/kserve/kserve): has provided [InferenceGraph](https://kserve.github.io/website/0.9/modelserving/inference_graph/), however it only supports inference service and lack of deployment support. | ||||||
|
||||||
|
||||||
**Compatibility** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
n/a | ||||||
|
||||||
**Miscs** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- TODO List: | ||||||
|
||||||
- [ ] one click deployment on AWS, GCP, Azure cloud | ||||||
- [ ] static cloud resource allocator vs dynamic cloud resource allocator | ||||||
- [ ] k8s GMC with istio | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow example rfc template in rfc_template.txt (same directory)