Add dataprep megaservice in README (opea-project#158)

Signed-off-by: lvliang-intel <[email protected]>
zehao-intel · May 21, 2024 · 3f28592 · 3f28592
1 parent 8dc89e2
commit 3f28592
Show file tree

Hide file tree

Showing 7 changed files with 120 additions and 48 deletions.
diff --git a/ChatQnA/microservice/gaudi/README.md b/ChatQnA/microservice/gaudi/README.md
@@ -16,70 +16,60 @@ cd GenAIComps
 ### 2. Build Embedding Image
 
 ```bash
-docker build -t opea/gen-ai-comps:embedding-tei-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/langchain/docker/Dockerfile .
+docker build --no-cache -t opea/gen-ai-comps:embedding-tei-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/langchain/docker/Dockerfile .
 ```
 
 ### 3. Build Retriever Image
 
 ```bash
-docker build -t opea/gen-ai-comps:retriever-redis-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/langchain/docker/Dockerfile .
+docker build --no-cache -t opea/gen-ai-comps:retriever-redis-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/langchain/docker/Dockerfile .
 ```
 
 ### 4. Build Rerank Image
 
 ```bash
-docker build -t opea/gen-ai-comps:reranking-tei-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/langchain/docker/Dockerfile .
+docker build --no-cache -t opea/gen-ai-comps:reranking-tei-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/langchain/docker/Dockerfile .
 ```
 
 ### 5. Build LLM Image
 
 ```bash
-docker build -t opea/gen-ai-comps:llm-tgi-gaudi-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/langchain/docker/Dockerfile .
+docker build --no-cache -t opea/gen-ai-comps:llm-tgi-gaudi-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/langchain/docker/Dockerfile .
 ```
 
-### 6. Build TEI Gaudi Image
-
-Since a TEI Gaudi Docker image hasn't been published, we'll need to build it from the [tei-guadi](https://github.com/huggingface/tei-gaudi) repository.
-
-```bash
-git clone https://github.com/huggingface/tei-gaudi
-cd tei-gaudi/
-docker build -f Dockerfile-hpu -t opea/tei-gaudi .
-```
-
-### 7. Pull TGI Gaudi Image
-
-As TGI Gaudi has been officially published as a Docker image, we simply need to pull it.
+### 6. Build Dataprep Image
 
 ```bash
-docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1
+docker build --no-cache -t opea/gen-ai-comps:dataprep-redis-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/docker/Dockerfile .
 ```
 
-### 8. Pull TEI Xeon Image
+### 7. Build TEI Gaudi Image
 
-Since TEI Gaudi doesn't support reranking models, we'll deploy TEI CPU serving instead. TEI CPU has been officially released as a Docker image, so we can easily pull it.
+Since a TEI Gaudi Docker image hasn't been published, we'll need to build it from the [tei-guadi](https://github.com/huggingface/tei-gaudi) repository.
 
 ```bash
-docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
+git clone https://github.com/huggingface/tei-gaudi
+cd tei-gaudi/
+docker build --no-cache -f Dockerfile-hpu -t opea/tei-gaudi .
 ```
 
-### 9. Build MegaService Docker Image
+### 8. Build MegaService Docker Image
 
 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build the MegaService Docker image using the command below:
 
 ```bash
 git clone https://github.com/opea-project/GenAIExamples.git
 cd GenAIExamples/ChatQnA/microservice/gaudi/
-docker build -t opea/gen-ai-comps:chatqna-megaservice-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
+docker build --no-cache -t opea/gen-ai-comps:chatqna-megaservice-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
 ```
 
-### 10. Build UI Docker Image
+### 9. Build UI Docker Image
 
 Construct the frontend Docker image using the command below:
 
 ```bash
 cd GenAIExamples/ChatQnA/ui/
-docker build -t opea/gen-ai-comps:chatqna-ui-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
+docker build --no-cache -t opea/gen-ai-comps:chatqna-ui-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
 ```
 
 Then run the command `docker images`, you will have the following 7 Docker Images:
@@ -89,10 +79,9 @@ Then run the command `docker images`, you will have the following 7 Docker Image
 3. `opea/gen-ai-comps:reranking-tei-server`
 4. `opea/gen-ai-comps:llm-tgi-gaudi-server`
 5. `opea/tei-gaudi`
-6. `ghcr.io/huggingface/tgi-gaudi:1.2.1`
-7. `ghcr.io/huggingface/text-embeddings-inference:cpu-1.2`
-8. `opea/gen-ai-comps:chatqna-megaservice-server`
-9. `opea/gen-ai-comps:chatqna-ui-server`
+6. `opea/gen-ai-comps:dataprep-redis-server`
+7. `opea/gen-ai-comps:chatqna-megaservice-server`
+8. `opea/gen-ai-comps:chatqna-ui-server`
 
 ## 🚀 Start MicroServices and MegaService
 
@@ -209,6 +198,30 @@ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
      }'
 ```
 
+9. Dataprep Microservice（Optional）
+
+If you want to update the default knowledge base, you can use the following commands:
+
+Update Knowledge Base via Local File Upload:
+
+```bash
+curl -X POST "http://${host_ip}:6007/v1/dataprep" \
+     -H "Content-Type: multipart/form-data" \
+     -F "files=@./nke-10k-2023.pdf"
+```
+
+This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
+
+Add Knowledge Base via HTTP Links:
+
+```bash
+curl -X POST "http://${host_ip}:6007/v1/dataprep" \
+     -H "Content-Type: multipart/form-data" \
+     -F 'link_list=["https://opea.dev"]'
+```
+
+This command updates a knowledge base by submitting a list of HTTP links for processing.
+
 ## 🚀 Launch the UI
 
 To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `docker_compose.yaml` file as shown below:

diff --git a/ChatQnA/microservice/gaudi/docker_compose.yaml b/ChatQnA/microservice/gaudi/docker_compose.yaml
@@ -21,7 +21,19 @@ services:
     ports:
       - "6379:6379"
       - "8001:8001"
-  tei_embedding_service:
+  dataprep-redis-service:
+    image: opea/gen-ai-comps:dataprep-redis-server
+    container_name: dataprep-redis-server
+    depends_on:
+      - redis-vector-db
+    ports:
+      - "6007:6007"
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      REDIS_URL: ${REDIS_URL}
+      INDEX_NAME: ${INDEX_NAME}
+  tei-embedding-service:
     image: opea/tei-gaudi
     container_name: tei-embedding-gaudi-server
     ports:
@@ -39,7 +51,7 @@ services:
     image: opea/gen-ai-comps:embedding-tei-server
     container_name: embedding-tei-server
     depends_on:
-      - tei_embedding_service
+      - tei-embedding-service
     ports:
       - "6000:6000"
     ipc: host
@@ -51,6 +63,8 @@ services:
   retriever:
     image: opea/gen-ai-comps:retriever-redis-server
     container_name: retriever-redis-server
+    depends_on:
+      - redis-vector-db
     ports:
       - "7000:7000"
     ipc: host
@@ -60,7 +74,7 @@ services:
       REDIS_URL: ${REDIS_URL}
       INDEX_NAME: ${INDEX_NAME}
     restart: unless-stopped
-  tei_xeon_service:
+  tei-xeon-service:
     image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
     container_name: tei-xeon-server
     ports:
@@ -76,7 +90,7 @@ services:
     image: opea/gen-ai-comps:reranking-tei-server
     container_name: reranking-tei-gaudi-server
     depends_on:
-      - tei_xeon_service
+      - tei-xeon-service
     ports:
       - "8000:8000"
     ipc: host
@@ -86,7 +100,7 @@ services:
       TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
       HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
     restart: unless-stopped
-  tgi_service:
+  tgi-service:
     image: ghcr.io/huggingface/tgi-gaudi:1.2.1
     container_name: tgi-gaudi-server
     ports:
@@ -104,7 +118,7 @@ services:
     image: opea/gen-ai-comps:llm-tgi-gaudi-server
     container_name: llm-tgi-gaudi-server
     depends_on:
-      - tgi_service
+      - tgi-service
     ports:
       - "9000:9000"
     ipc: host
@@ -119,12 +133,12 @@ services:
     container_name: chatqna-gaudi-backend-server
     depends_on:
       - redis-vector-db
-      - tei_embedding_service
+      - tei-embedding-service
       - embedding
       - retriever
-      - tei_xeon_service
+      - tei-xeon-service
       - reranking
-      - tgi_service
+      - tgi-service
       - llm
     ports:
       - "8888:8888"

diff --git a/ChatQnA/microservice/xeon/README.md b/ChatQnA/microservice/xeon/README.md
@@ -43,7 +43,13 @@ docker build -t opea/gen-ai-comps:reranking-tei-xeon-server --build-arg https_pr
 docker build -t opea/gen-ai-comps:llm-tgi-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/langchain/docker/Dockerfile .
 ```
 
-### 5. Build MegaService Docker Image
+### 5. Build Dataprep Image
+
+```bash
+docker build --no-cache -t opea/gen-ai-comps:dataprep-redis-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/docker/Dockerfile .
+```
+
+### 6. Build MegaService Docker Image
 
 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command:
 
@@ -53,7 +59,7 @@ cd GenAIExamples/ChatQnA/microservice/xeon/
 docker build -t opea/gen-ai-comps:chatqna-megaservice-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
 ```
 
-### 6. Build UI Docker Image
+### 7. Build UI Docker Image
 
 Build frontend Docker image via below command:
 
@@ -184,6 +190,30 @@ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
      }'
 ```
 
+9. Dataprep Microservice（Optional）
+
+If you want to update the default knowledge base, you can use the following commands:
+
+Update Knowledge Base via Local File Upload:
+
+```bash
+curl -X POST "http://${host_ip}:6007/v1/dataprep" \
+     -H "Content-Type: multipart/form-data" \
+     -F "files=@./nke-10k-2023.pdf"
+```
+
+This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
+
+Add Knowledge Base via HTTP Links:
+
+```bash
+curl -X POST "http://${host_ip}:6007/v1/dataprep" \
+     -H "Content-Type: multipart/form-data" \
+     -F 'link_list=["https://opea.dev"]'
+```
+
+This command updates a knowledge base by submitting a list of HTTP links for processing.
+
 ## 🚀 Launch the UI
 
 To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `docker_compose.yaml` file as shown below:

diff --git a/ChatQnA/microservice/xeon/docker_compose.yaml b/ChatQnA/microservice/xeon/docker_compose.yaml
@@ -21,7 +21,19 @@ services:
     ports:
       - "6379:6379"
       - "8001:8001"
-  tei_embedding_service:
+  dataprep-redis-service:
+    image: opea/gen-ai-comps:dataprep-redis-server
+    container_name: dataprep-redis-server
+    depends_on:
+      - redis-vector-db
+    ports:
+      - "6007:6007"
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      REDIS_URL: ${REDIS_URL}
+      INDEX_NAME: ${INDEX_NAME}
+  tei-embedding-service:
     image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
     container_name: tei-embedding-server
     ports:
@@ -37,19 +49,20 @@ services:
     image: opea/gen-ai-comps:embedding-tei-server
     container_name: embedding-tei-server
     depends_on:
-      - tei_embedding_service
+      - tei-embedding-service
     ports:
       - "6000:6000"
     ipc: host
     environment:
       http_proxy: ${http_proxy}
-
       https_proxy: ${https_proxy}
       TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
     restart: unless-stopped
   retriever:
     image: opea/gen-ai-comps:retriever-redis-server
     container_name: retriever-redis-server
+    depends_on:
+      - redis-vector-db
     ports:
       - "7000:7000"
     ipc: host
@@ -60,7 +73,7 @@ services:
       INDEX_NAME: ${INDEX_NAME}
       TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
     restart: unless-stopped
-  tei_xeon_service:
+  tei-xeon-service:
     image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
     container_name: tei-xeon-server
     ports:
@@ -76,7 +89,7 @@ services:
     image: opea/gen-ai-comps:reranking-tei-xeon-server
     container_name: reranking-tei-xeon-server
     depends_on:
-      - tei_xeon_service
+      - tei-xeon-service
     ports:
       - "8000:8000"
     ipc: host
@@ -117,10 +130,10 @@ services:
     container_name: chatqna-xeon-backend-server
     depends_on:
       - redis-vector-db
-      - tei_embedding_service
+      - tei-embedding-service
       - embedding
       - retriever
-      - tei_xeon_service
+      - tei-xeon-service
       - reranking
       - tgi_service
       - llm

diff --git a/ChatQnA/tests/test_chatqna_on_gaudi.sh b/ChatQnA/tests/test_chatqna_on_gaudi.sh
@@ -17,6 +17,7 @@ function build_docker_images() {
     docker build -t opea/gen-ai-comps:retriever-redis-server -f comps/retrievers/langchain/docker/Dockerfile .
     docker build -t opea/gen-ai-comps:reranking-tei-server -f comps/reranks/langchain/docker/Dockerfile .
     docker build -t opea/gen-ai-comps:llm-tgi-gaudi-server -f comps/llms/langchain/docker/Dockerfile .
+    docker build -t opea/gen-ai-comps:dataprep-redis-server -f comps/dataprep/redis/docker/Dockerfile .
 
     cd ..
     git clone https://github.com/huggingface/tei-gaudi

diff --git a/ChatQnA/tests/test_chatqna_on_xeon.sh b/ChatQnA/tests/test_chatqna_on_xeon.sh
@@ -18,6 +18,7 @@ function build_docker_images() {
     docker build -t opea/gen-ai-comps:retriever-redis-server -f comps/retrievers/langchain/docker/Dockerfile .
     docker build -t opea/gen-ai-comps:reranking-tei-xeon-server -f comps/reranks/langchain/docker/Dockerfile .
     docker build -t opea/gen-ai-comps:llm-tgi-server -f comps/llms/langchain/docker/Dockerfile .
+    docker build -t opea/gen-ai-comps:dataprep-redis-server -f comps/dataprep/redis/docker/Dockerfile .
 
     cd $WORKPATH/microservice/xeon
     docker build --no-cache -t opea/gen-ai-comps:chatqna-megaservice-server -f docker/Dockerfile .

diff --git a/CodeGen/microservice/gaudi/README.md b/CodeGen/microservice/gaudi/README.md
@@ -40,7 +40,7 @@ docker build -t opea/gen-ai-comps:codegen-ui-server --build-arg https_proxy=$htt
 
 Then run the command `docker images`, you will have the following 3 Docker Images:
 
-1. `opea/gen-ai-comps:llm-tgi-server`
+1. `opea/gen-ai-comps:llm-tgi-gaudi-server`
 2. `opea/gen-ai-comps:codegen-megaservice-server`
 3. `opea/gen-ai-comps:codegen-ui-server`