-
Notifications
You must be signed in to change notification settings - Fork 192
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add AudioQnA example via GMC. Signed-off-by: zhlsunshine <[email protected]> * add more information for e2e test scritpts. Signed-off-by: zhlsunshine <[email protected]> * fix bug in e2e test scripts. Signed-off-by: zhlsunshine <[email protected]>
- Loading branch information
Steve Zhang
authored
Aug 16, 2024
1 parent
039014f
commit c86cf85
Showing
5 changed files
with
412 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Deploy AudioQnA in Kubernetes Cluster on Xeon and Gaudi | ||
|
||
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline components on Intel Xeon server and Gaudi machines. | ||
|
||
The AudioQnA Service leverages a Kubernetes operator called genai-microservices-connector(GMC). GMC supports connecting microservices to create pipelines based on the specification in the pipeline yaml file in addition to allowing the user to dynamically control which model is used in a service such as an LLM or embedder. The underlying pipeline language also supports using external services that may be running in public or private cloud elsewhere. | ||
|
||
Install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector). Soon as we publish images to Docker Hub, at which point no builds will be required, simplifying install. | ||
|
||
|
||
The AudioQnA application is defined as a Custom Resource (CR) file that the above GMC operator acts upon. It first checks if the microservices listed in the CR yaml file are running, if not starts them and then proceeds to connect them. When the AudioQnA pipeline is ready, the service endpoint details are returned, letting you use the application. Should you use "kubectl get pods" commands you will see all the component microservices, in particular `asr`, `tts`, and `llm`. | ||
|
||
|
||
## Using prebuilt images | ||
|
||
The AudioQnA uses the below prebuilt images if you choose a Xeon deployment | ||
|
||
- tgi-service: ghcr.io/huggingface/text-generation-inference:1.4 | ||
- llm: opea/llm-tgi:latest | ||
- asr: opea/asr:latest | ||
- whisper: opea/whisper:latest | ||
- tts: opea/tts:latest | ||
- speecht5: opea/speecht5:latest | ||
|
||
|
||
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services. | ||
For Gaudi: | ||
|
||
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1 | ||
- whisper-gaudi: opea/whisper-gaudi:latest | ||
- speecht5-gaudi: opea/speecht5-gaudi:latest | ||
|
||
> [NOTE] | ||
> Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use. | ||
## Deploy AudioQnA pipeline | ||
This involves deploying the AudioQnA custom resource. You can use audioQnA_xeon.yaml or if you have a Gaudi cluster, you could use audioQnA_gaudi.yaml. | ||
|
||
1. Create namespace and deploy application | ||
```sh | ||
kubectl create ns audioqa | ||
kubectl apply -f $(pwd)/audioQnA_xeon.yaml | ||
``` | ||
|
||
2. GMC will reconcile the AudioQnA custom resource and get all related components/services ready. Check if the service up. | ||
|
||
```sh | ||
kubectl get service -n audioqa | ||
``` | ||
|
||
3. Retrieve the application access URL | ||
|
||
```sh | ||
kubectl get gmconnectors.gmc.opea.io -n audioqa | ||
NAME URL READY AGE | ||
audioqa http://router-service.audioqa.svc.cluster.local:8080 6/0/6 5m | ||
``` | ||
|
||
4. Deploy a client pod to test the application | ||
|
||
```sh | ||
kubectl create deployment client-test -n audioqa --image=python:3.8.13 -- sleep infinity | ||
``` | ||
|
||
5. Access the application using the above URL from the client pod | ||
|
||
```sh | ||
export CLIENT_POD=$(kubectl get pod -n audioqa -l app=client-test -o jsonpath={.items..metadata.name}) | ||
export accessUrl=$(kubectl get gmc -n audioqa -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}") | ||
kubectl exec "$CLIENT_POD" -n audioqa -- curl $accessUrl -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | ||
``` | ||
|
||
> [NOTE] | ||
|
||
You can remove your AudioQnA pipeline by executing standard Kubernetes kubectl commands to remove a custom resource. Verify it was removed by executing kubectl get pods in the audioqa namespace. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
apiVersion: gmc.opea.io/v1alpha3 | ||
kind: GMConnector | ||
metadata: | ||
labels: | ||
app.kubernetes.io/name: gmconnector | ||
app.kubernetes.io/managed-by: kustomize | ||
gmc/platform: gaudi | ||
name: audioqa | ||
namespace: audioqa | ||
spec: | ||
routerConfig: | ||
name: router | ||
serviceName: router-service | ||
nodes: | ||
root: | ||
routerType: Sequence | ||
steps: | ||
- name: Asr | ||
internalService: | ||
serviceName: asr-svc | ||
config: | ||
endpoint: /v1/audio/transcriptions | ||
ASR_ENDPOINT: whisper-gaudi-svc | ||
- name: WhisperGaudi | ||
internalService: | ||
serviceName: whisper-gaudi-svc | ||
config: | ||
endpoint: /v1/asr | ||
isDownstreamService: true | ||
- name: Llm | ||
data: $response | ||
internalService: | ||
serviceName: llm-svc | ||
config: | ||
endpoint: /v1/chat/completions | ||
TGI_LLM_ENDPOINT: tgi-gaudi-svc | ||
- name: TgiGaudi | ||
internalService: | ||
serviceName: tgi-gaudi-svc | ||
config: | ||
endpoint: /generate | ||
isDownstreamService: true | ||
- name: Tts | ||
data: $response | ||
internalService: | ||
serviceName: tts-svc | ||
config: | ||
endpoint: /v1/audio/speech | ||
TTS_ENDPOINT: speecht5-gaudi-svc | ||
- name: SpeechT5Gaudi | ||
internalService: | ||
serviceName: speecht5-gaudi-svc | ||
config: | ||
endpoint: /v1/tts | ||
isDownstreamService: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
apiVersion: gmc.opea.io/v1alpha3 | ||
kind: GMConnector | ||
metadata: | ||
labels: | ||
app.kubernetes.io/name: gmconnector | ||
app.kubernetes.io/managed-by: kustomize | ||
gmc/platform: xeon | ||
name: audioqa | ||
namespace: audioqa | ||
spec: | ||
routerConfig: | ||
name: router | ||
serviceName: router-service | ||
nodes: | ||
root: | ||
routerType: Sequence | ||
steps: | ||
- name: Asr | ||
internalService: | ||
serviceName: asr-svc | ||
config: | ||
endpoint: /v1/audio/transcriptions | ||
ASR_ENDPOINT: whisper-svc | ||
- name: Whisper | ||
internalService: | ||
serviceName: whisper-svc | ||
config: | ||
endpoint: /v1/asr | ||
isDownstreamService: true | ||
- name: Llm | ||
data: $response | ||
internalService: | ||
serviceName: llm-svc | ||
config: | ||
endpoint: /v1/chat/completions | ||
TGI_LLM_ENDPOINT: tgi-svc | ||
- name: Tgi | ||
internalService: | ||
serviceName: tgi-svc | ||
config: | ||
endpoint: /generate | ||
isDownstreamService: true | ||
- name: Tts | ||
data: $response | ||
internalService: | ||
serviceName: tts-svc | ||
config: | ||
endpoint: /v1/audio/speech | ||
TTS_ENDPOINT: speecht5-svc | ||
- name: SpeechT5 | ||
internalService: | ||
serviceName: speecht5-svc | ||
config: | ||
endpoint: /v1/tts | ||
isDownstreamService: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
#!/bin/bash | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
set -xe | ||
USER_ID=$(whoami) | ||
LOG_PATH=/home/$(whoami)/logs | ||
MOUNT_DIR=/home/$USER_ID/.cache/huggingface/hub | ||
IMAGE_REPO=${IMAGE_REPO:-} | ||
|
||
function install_audioqa() { | ||
kubectl create ns $APP_NAMESPACE | ||
sed -i "s|namespace: audioqa|namespace: $APP_NAMESPACE|g" ./audioQnA_gaudi.yaml | ||
kubectl apply -f ./audioQnA_gaudi.yaml | ||
|
||
# Wait until the router service is ready | ||
echo "Waiting for the audioqa router service to be ready..." | ||
wait_until_pod_ready "audioqa router" $APP_NAMESPACE "router-service" | ||
output=$(kubectl get pods -n $APP_NAMESPACE) | ||
echo $output | ||
} | ||
|
||
function validate_audioqa() { | ||
# deploy client pod for testing | ||
kubectl create deployment client-test -n $APP_NAMESPACE --image=python:3.8.13 -- sleep infinity | ||
|
||
# wait for client pod ready | ||
wait_until_pod_ready "client-test" $APP_NAMESPACE "client-test" | ||
# giving time to populating data | ||
sleep 60 | ||
|
||
kubectl get pods -n $APP_NAMESPACE | ||
# send request to audioqa | ||
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name}) | ||
echo "$CLIENT_POD" | ||
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}") | ||
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str) | ||
echo "$byte_str" > $LOG_PATH/curl_audioqa.log | ||
if [ -z "$byte_str" ]; then | ||
echo "audioqa failed, please check the logs in ${LOG_PATH}!" | ||
exit 1 | ||
fi | ||
echo "Audioqa response check succeed!" | ||
} | ||
|
||
function wait_until_pod_ready() { | ||
echo "Waiting for the $1 to be ready..." | ||
max_retries=30 | ||
retry_count=0 | ||
while ! is_pod_ready $2 $3; do | ||
if [ $retry_count -ge $max_retries ]; then | ||
echo "$1 is not ready after waiting for a significant amount of time" | ||
get_gmc_controller_logs | ||
exit 1 | ||
fi | ||
echo "$1 is not ready yet. Retrying in 10 seconds..." | ||
sleep 10 | ||
output=$(kubectl get pods -n $2) | ||
echo $output | ||
retry_count=$((retry_count + 1)) | ||
done | ||
} | ||
|
||
function is_pod_ready() { | ||
if [ "$2" == "gmc-controller" ]; then | ||
pod_status=$(kubectl get pods -n $1 -o jsonpath='{.items[].status.conditions[?(@.type=="Ready")].status}') | ||
else | ||
pod_status=$(kubectl get pods -n $1 -l app=$2 -o jsonpath='{.items[].status.conditions[?(@.type=="Ready")].status}') | ||
fi | ||
if [ "$pod_status" == "True" ]; then | ||
return 0 | ||
else | ||
return 1 | ||
fi | ||
} | ||
|
||
function get_gmc_controller_logs() { | ||
# Fetch the name of the pod with the app-name gmc-controller in the specified namespace | ||
pod_name=$(kubectl get pods -n $SYSTEM_NAMESPACE -l control-plane=gmc-controller -o jsonpath='{.items[0].metadata.name}') | ||
|
||
# Check if the pod name was found | ||
if [ -z "$pod_name" ]; then | ||
echo "No pod found with app-name gmc-controller in namespace $SYSTEM_NAMESPACE" | ||
return 1 | ||
fi | ||
|
||
# Get the logs of the found pod | ||
echo "Fetching logs for pod $pod_name in namespace $SYSTEM_NAMESPACE..." | ||
kubectl logs $pod_name -n $SYSTEM_NAMESPACE | ||
} | ||
|
||
if [ $# -eq 0 ]; then | ||
echo "Usage: $0 <function_name>" | ||
exit 1 | ||
fi | ||
|
||
case "$1" in | ||
install_AudioQnA) | ||
pushd AudioQnA/kubernetes | ||
install_audioqa | ||
popd | ||
;; | ||
validate_AudioQnA) | ||
pushd AudioQnA/kubernetes | ||
validate_audioqa | ||
popd | ||
;; | ||
*) | ||
echo "Unknown function: $1" | ||
;; | ||
esac |
Oops, something went wrong.