lm-sys · infwinston · Dec 16, 2023 · Dec 4, 2023
diff --git a/fastchat/llm_judge/README.md b/fastchat/llm_judge/README.md
@@ -40,26 +40,48 @@ You can use this QA browser to view the answers generated by you later.
 ### Evaluate a model on MT-bench
 
 #### Step 1. Generate model answers to MT-bench questions
+
+To generate model answers, you can either use [vLLM](https://github.com/vllm-project/vllm) via a FastChat server (recommended) or Hugging Face.
+
+##### Using vLLM (recommended):
+
+1. Launch a VLLM worker 
 ```
-python gen_model_answer.py --model-path [MODEL-PATH] --model-id [MODEL-ID]
+python3 -m fastchat.serve.controller
+python3 -m fastchat.serve.vllm_worker --model-path [MODEL-PATH]
+python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
 ```
-Arguments:
-  - `[MODEL-PATH]` is the path to the weights, which can be a local folder or a Hugging Face repo ID.
-  - `[MODEL-ID]` is a name you give to the model.
+  - Arguments:
+    - `[MODEL-PATH]` is the path to the weights, which can be a local folder or a Hugging Face repo ID.
 
-e.g.,
+2. Generate the answers
 ```
-python gen_model_answer.py --model-path lmsys/vicuna-7b-v1.5 --model-id vicuna-7b-v1.5
+python gen_api_answer.py --model [MODEL-NAME] --openai-api-base https://localhost:8000/v1 --parallel 50
 ```
-The answers will be saved to `data/mt_bench/model_answer/[MODEL-ID].jsonl`.
+  - Arguments:
+    - `[MODEL-NAME]` is the name of the model from Step 1.
+    - `--parallel` is the number of concurrent API calls to the vLLM worker.
 
-To make sure FastChat loads the correct prompt template, see the supported models and how to add a new model [here](../../docs/model_support.md#how-to-support-a-new-model).
+##### Using Hugging Face:
+1. Generate the answers
+```
+python gen_model_answer.py --model-path [MODEL-PATH] --model-id [MODEL-ID]
+```
+- Arguments:
+    - `[MODEL-PATH]` is the path to the weights, which can be a local folder or a Hugging Face repo ID.
+    - `[MODEL-ID]` is a name you give to the model.
+
+  - You can also specify `--num-gpus-per-model` for model parallelism (needed for large 65B models) and `--num-gpus-total` to parallelize answer generation with multiple GPUs.
 
-You can also specify `--num-gpus-per-model` for model parallelism (needed for large 65B models) and `--num-gpus-total` to parallelize answer generation with multiple GPUs.
+- e.g. `python gen_model_answer.py --model-path lmsys/vicuna-7b-v1.5 --model-id vicuna-7b-v1.5`
+
+The answers will be saved to `data/mt_bench/model_answer/[MODEL-ID/MODEL-NAME].jsonl`.
+
+To make sure FastChat loads the correct prompt template, see the supported models and how to add a new model [here](../../docs/model_support.md#how-to-support-a-new-model).
 
 #### Step 2. Generate GPT-4 judgments
 There are several options to use GPT-4 as a judge, such as pairwise winrate and single-answer grading.
-In MT-bench, we recommond single-answer grading as the default mode.
+In MT-bench, we recommend single-answer grading as the default mode.
 This mode asks GPT-4 to grade and give a score to model's answer directly without pairwise comparison.
 For each turn, GPT-4 will give a score on a scale of 10. We then compute the average score on all turns.