Skip to content

Commit

Permalink
doc: fix headings and indenting (#748)
Browse files Browse the repository at this point in the history
* doc: fix headings and indenting
* only one H1 header (for title) is allowed
* fix indenting under ordered lists

Signed-off-by: David B. Kinder <[email protected]>
  • Loading branch information
dbkinder authored Sep 6, 2024
1 parent 947936e commit 67394b8
Show file tree
Hide file tree
Showing 32 changed files with 1,014 additions and 1,012 deletions.
58 changes: 29 additions & 29 deletions ChatQnA/benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ This folder contains a collection of Kubernetes manifest files for deploying the

By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.

# Purpose
## Purpose

We aim to run these benchmarks and share them with the OPEA community for three primary reasons:

- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.

# Metrics
## Metrics

The benchmark will report the below metrics, including:

Expand All @@ -27,9 +27,9 @@ The benchmark will report the below metrics, including:

Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.

# Getting Started
## Getting Started

## Prerequisites
### Prerequisites

- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).

Expand All @@ -38,7 +38,7 @@ Results will be displayed in the terminal and saved as CSV file named `1_stats.c
- Install Python 3.8+ on the master node for running the stress tool.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.

## Kubernetes Cluster Example
### Kubernetes Cluster Example

```bash
$ kubectl get nodes
Expand All @@ -49,7 +49,7 @@ k8s-work2 Ready <none> 35d v1.29.6
k8s-work3 Ready <none> 35d v1.29.6
```

## Manifest preparation
### Manifest preparation

We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single node, two nodes and four nodes K8s cluster. In order to apply, we need to check out and configure some values.

Expand All @@ -75,7 +75,7 @@ find . -name '*.yaml' -type f -exec sed -i "s#\$(EMBEDDING_MODEL_ID)#${EMBEDDING
find . -name '*.yaml' -type f -exec sed -i "s#\$(RERANK_MODEL_ID)#${RERANK_MODEL_ID}#g" {} \;
```

## Benchmark tool preparation
### Benchmark tool preparation

The test uses the [benchmark tool](https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark) to do performance test. We need to set up benchmark tool at the master node of Kubernetes which is k8s-master.

Expand All @@ -88,7 +88,7 @@ source stress_venv/bin/activate
pip install -r requirements.txt
```

## Test Configurations
### Test Configurations

Workload configuration:

Expand Down Expand Up @@ -119,19 +119,19 @@ Number of test requests for different scheduled node number:

More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).

## Test Steps
### Test Steps

### Single node test
#### Single node test

#### 1. Preparation
##### 1. Preparation

We add label to 1 Kubernetes node to make sure all pods are scheduled to this node:

```bash
kubectl label nodes k8s-worker1 node-type=chatqna-opea
```

#### 2. Install ChatQnA
##### 2. Install ChatQnA

Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/single_gaudi) and apply to K8s.

Expand All @@ -141,9 +141,9 @@ cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi
kubectl apply -f .
```

#### 3. Run tests
##### 3. Run tests

##### 3.1 Upload Retrieval File
###### 3.1 Upload Retrieval File

Before running tests, upload a specified file to make sure the llm input have the token length of 1k.

Expand Down Expand Up @@ -174,7 +174,7 @@ curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
-F "files=@./upload_file_no_rerank.txt"
```

##### 3.2 Run Benchmark Test
###### 3.2 Run Benchmark Test

We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.

Expand All @@ -191,11 +191,11 @@ cd GenAIEval/evals/benchmark
python benchmark.py
```

#### 4. Data collection
##### 4. Data collection

All the test results will come to this folder `/home/sdp/benchmark_output/node_1` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.

#### 5. Clean up
##### 5. Clean up

```bash
# on k8s-master node
Expand All @@ -204,17 +204,17 @@ kubectl delete -f .
kubectl label nodes k8s-worker1 node-type-
```

### Two node test
#### Two node test

#### 1. Preparation
##### 1. Preparation

We add label to 2 Kubernetes node to make sure all pods are scheduled to this node:

```bash
kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
```

#### 2. Install ChatQnA
##### 2. Install ChatQnA

Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/two_gaudi) and apply to K8s.

Expand All @@ -224,7 +224,7 @@ cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/two_gaudi
kubectl apply -f .
```

#### 3. Run tests
##### 3. Run tests

We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.

Expand All @@ -241,29 +241,29 @@ cd GenAIEval/evals/benchmark
python benchmark.py
```

#### 4. Data collection
##### 4. Data collection

All the test results will come to this folder `/home/sdp/benchmark_output/node_2` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.

#### 5. Clean up
##### 5. Clean up

```bash
# on k8s-master node
kubectl delete -f .
kubectl label nodes k8s-worker1 k8s-worker2 node-type-
```

### Four node test
#### Four node test

#### 1. Preparation
##### 1. Preparation

We add label to 4 Kubernetes node to make sure all pods are scheduled to this node:

```bash
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=chatqna-opea
```

#### 2. Install ChatQnA
##### 2. Install ChatQnA

Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/four_gaudi) and apply to K8s.

Expand All @@ -273,7 +273,7 @@ cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/four_gaudi
kubectl apply -f .
```

#### 3. Run tests
##### 3. Run tests

We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.

Expand All @@ -290,11 +290,11 @@ cd GenAIEval/evals/benchmark
python benchmark.py
```

#### 4. Data collection
##### 4. Data collection

All the test results will come to this folder `/home/sdp/benchmark_output/node_4` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.

#### 5. Clean up
##### 5. Clean up

```bash
# on k8s-master node
Expand Down
120 changes: 60 additions & 60 deletions ChatQnA/docker/aipc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,97 +173,97 @@ OLLAMA_HOST=${host_ip}:11434 ollama run $OLLAMA_MODEL

1. TEI Embedding Service

```bash
curl ${host_ip}:6006/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
```bash
curl ${host_ip}:6006/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```

2. Embedding Microservice

```bash
curl http://${host_ip}:6000/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
```
```bash
curl http://${host_ip}:6000/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
```

3. Retriever Microservice
To validate the retriever microservice, you need to generate a mock embedding vector of length 768 in Python script:

```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
-X POST \
-d '{"text":"What is the revenue of Nike in 2023?","embedding":"'"${your_embedding}"'"}' \
-H 'Content-Type: application/json'
```
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
-X POST \
-d '{"text":"What is the revenue of Nike in 2023?","embedding":"'"${your_embedding}"'"}' \
-H 'Content-Type: application/json'
```

4. TEI Reranking Service

```bash
curl http://${host_ip}:8808/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
```bash
curl http://${host_ip}:8808/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```

5. Reranking Microservice

```bash
curl http://${host_ip}:8000/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
```
```bash
curl http://${host_ip}:8000/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
```

6. Ollama Service

```bash
curl http://${host_ip}:11434/api/generate -d '{"model": "llama3", "prompt":"What is Deep Learning?"}'
```
```bash
curl http://${host_ip}:11434/api/generate -d '{"model": "llama3", "prompt":"What is Deep Learning?"}'
```

7. LLM Microservice

```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

8. MegaService

```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?", "model": "'"${OLLAMA_MODEL}"'"
}'
```
```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?", "model": "'"${OLLAMA_MODEL}"'"
}'
```

9. Dataprep Microservice(Optional)

If you want to update the default knowledge base, you can use the following commands:
If you want to update the default knowledge base, you can use the following commands:

Update Knowledge Base via Local File Upload:
Update Knowledge Base via Local File Upload:

```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```

This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.

Add Knowledge Base via HTTP Links:
Add Knowledge Base via HTTP Links:

```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```

This command updates a knowledge base by submitting a list of HTTP links for processing.
This command updates a knowledge base by submitting a list of HTTP links for processing.

## 🚀 Launch the UI

Expand Down
Loading

0 comments on commit 67394b8

Please sign in to comment.