[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy #5362

KuntaiDu · 2024-06-09T08:09:35Z

Following PR #5073, this PR aims to compare vllm and alternatives (like tgi, tensorrt-llm and lmdeploy --- feel free to comment if you feel there are also other alternatives we need to benchmark) ON THE SAME WORKLOAD (the same as PR #5073) USING THE SAME BENCHMARKING SCRIPT (benchmark_serving.py).

For fair comparison, we will run vllm and alternatives in their corresponding official docker image.

This will be a nightly benchmark as running all alternatives on all workloads can be pretty time-consuming.

TODO lists:

implement initial version of one-click runnable benchmarking script for tgi
implement initial version of one-click runnable benchmarking script for tensorrt-llm
implement initial version of one-click runnable benchmarking script for lmdeploy
adjust these scripts so that they parse the workload from nightly-tests.json
integrate it inside the CI system
adjust the presentation of the benchmarking result (in the format of a markdown file)

PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Model] for adding a new model or improving an existing model. Model name should appear in the title.
[Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
[Kernel] for changes affecting CUDA kernels or other compute kernels.
[Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
[Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

We adhere to Google Python style guide and Google C++ style guide.
Pass all linter checks. Please use format.sh to format your code.
The code need to be well-documented to ensure future contributors can easily understand the code.
Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

KuntaiDu · 2024-06-19T08:19:02Z

I have finished an initial implementation on lmdeploy and tgi, which has entry point run-nightly-suite.sh that can parses nightly-tests.json and generate benchmarking results. This script is one-click runnable in the official docker of lmdeploy and tgi. I will continue on trt.

.buildkite/nightly-benchmarks/kickoff-pipeline.sh

.buildkite/nightly-benchmarks/nightly-pipeline.yaml

.buildkite/nightly-benchmarks/scripts/plot-nightly-results.py

…ot overlap with performance benchmark

simon-mo

This looks good. One thing I couldn't find a single place to find all the workload description (i.e. where is the parameters for benchmark serving which should be identical for all backends).

zhyncs · 2024-07-13T05:52:42Z

Hi @KuntaiDu Nice work! May we consider adding benchmarks for glm-4-9b-chat and Qwen2-72B-Instruct? They are currently the SOTA models in CJK native support. If OK and needed, I am happy to help. Thanks. cc @simon-mo

vllm-project#5362)

KuntaiDu · 2024-07-25T22:24:25Z

Hi @KuntaiDu Nice work! May we consider adding benchmarks for glm-4-9b-chat and Qwen2-72B-Instruct? They are currently the SOTA models in CJK native support. If OK and needed, I am happy to help. Thanks. cc @simon-mo

Sorry for the late reply (github somehow did not remind me of this message). Feel free to raise a new PR to do this!

Kuntai: add tgi and trt benchmarking script (initial version)

f813e2e

KuntaiDu marked this pull request as draft June 9, 2024 19:18

KuntaiDu mentioned this pull request Jun 11, 2024

[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with perf-benchmarks label #5073

Merged

KuntaiDu added 18 commits June 13, 2024 23:13

update initial benchmarking script for lmdeploy

d6cba46

Merge branch 'vllm-project:main' into kuntai-benchmark-dev

2cc1023

Add download tokenizer script for lmdeploy

5d8292b

add one-click runnable script for lmdeploy, parse tests from json file

a2dd7c9

add nightly test json file

8416ce6

bug fix on tokenizer directory

df4ba8f

bug fix on getting model

b974495

update test cases

80d1c77

update parameter name

b3f3b0e

typo fix

9483acf

add wait_for_server

d72ae51

update summarization script

0e819f0

use pkill tp kill lmdeploy

9181a1d

update script for tgi

6e1936c

add install jq

c6aded9

reduce 7b llama tp to 1

ccbcd18

update lmdeploy tp

38cc38a

bug fix

832891e

KuntaiDu added 8 commits June 19, 2024 21:21

update tensorrt script

5877806

update nightly suite

9972aba

add double quote

6493679

add trt llm version

e62cae6

update trt

9ce3589

update on how to kill the server

f634dee

update vllm nightly test

b47e30b

disalbe vllm server log

ec8b295

KuntaiDu added 6 commits July 6, 2024 23:42

increase font size, adjust coloring

a3085a1

adjust font size

0a554ae

adjust spacing

c6c9292

increase font size

4788d27

increase cap size

ccc160c

make yapf and ruff happy

b6c5572

simon-mo requested changes Jul 8, 2024

View reviewed changes

.buildkite/nightly-benchmarks/kickoff-pipeline.sh Outdated Show resolved Hide resolved

.buildkite/nightly-benchmarks/nightly-pipeline.yaml Outdated Show resolved Hide resolved

.buildkite/nightly-benchmarks/scripts/plot-nightly-results.py Outdated Show resolved Hide resolved

allow running performance benchmark & nightly benchmark simultaneously

13d8c04

KuntaiDu added the perf-benchmarks label Jul 9, 2024

KuntaiDu added 13 commits July 9, 2024 16:34

adjust the annotation context for nightly benchmark so that it does n…

4d77e8f

…ot overlap with performance benchmark

cut redundant lines in nightly-pipeline.yaml using yaml anchor

da41c53

add dpi=400

c108084

adjust pipeline upload order

57e6783

merge two pipelines using yq

b057b4b

adjust merging logic

1053900

put blocking step as the first step

5ef7e8a

this file has been moved to vllm-project/buildkite-ci. Remove it.

bbe115d

add warning message

fb1e392

add a wait at the end, essential when merging multiple yaml files

50ed6b7

adjust pipeline.yaml

9758f94

adjust pipeline.yaml

8608d17

fix pipeline yaml

37c4c11

KuntaiDu requested a review from simon-mo July 11, 2024 17:01

simon-mo approved these changes Jul 11, 2024

View reviewed changes

simon-mo merged commit a4feba9 into vllm-project:main Jul 11, 2024
71 checks passed

KuntaiDu deleted the kuntai-benchmark-dev branch July 11, 2024 20:39

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (

61e7ec7

vllm-project#5362)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (

a24a3ff

vllm-project#5362)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy #5362

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy #5362

KuntaiDu commented Jun 9, 2024 •

edited

Loading

KuntaiDu commented Jun 19, 2024

simon-mo left a comment

zhyncs commented Jul 13, 2024

KuntaiDu commented Jul 25, 2024

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy #5362

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy #5362

Conversation

KuntaiDu commented Jun 9, 2024 • edited Loading

PR Title and Classification

Code Quality

Notes for Large Changes

What to Expect for the Reviews

Thank You

KuntaiDu commented Jun 19, 2024

simon-mo left a comment

Choose a reason for hiding this comment

zhyncs commented Jul 13, 2024

KuntaiDu commented Jul 25, 2024

KuntaiDu commented Jun 9, 2024 •

edited

Loading