[Usage]: Profiling Prefill and Decode Phases Separately #4900

Msiavashi · 2024-05-18T20:21:37Z

Your current environment

I'm attempting to independently measure the performance (e.g., latency, throughput, etc.) of the prefill and decode phases. Is there a way to achieve this? I have noticed a few benchmarks that measure end-to-end throughput and latency but do not provide separate metrics for each phase.

I would greatly appreciate any guidance on profiling these two phases separately.

How would you like to use vllm

No response

leiwen83 · 2024-05-19T13:13:23Z

stream mode shall get each token's latency, and thus prefill and decode phase could be measured.
While current benchmark using sync mode, another workaround may be considered is:

measure latency for input_len=1000, output_len=1, thus get prefill latency for input_len=1000
measure latency for input_len=1, output_len=1, get average latency A, and then input_len=1, output_len=1000, get average latency B. (B-A)/999 to get the decode latency...

Msiavashi · 2024-05-20T09:07:11Z

So, there is still no embedded mechanism for these measurements/profiling, right?

KevinZeng08 · 2024-06-25T08:34:00Z

stream mode shall get each token's latency, and thus prefill and decode phase could be measured. While current benchmark using sync mode, another workaround may be considered is:

measure latency for input_len=1000, output_len=1, thus get prefill latency for input_len=1000

measure latency for input_len=1, output_len=1, get average latency A, and then input_len=1, output_len=1000, get average latency B. (B-A)/999 to get the decode latency...

Hi, now do you have some other ways for profiling prefill and decode phase separately?

kerthcet · 2024-06-28T02:57:05Z

There's a ongoing PR related. #2809

dshm · 2024-09-19T06:35:41Z

same question. now do we have some other ways for profiling prefill and decode phase separately?

Msiavashi added the usage How to use vllm label May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Profiling Prefill and Decode Phases Separately #4900

[Usage]: Profiling Prefill and Decode Phases Separately #4900

Msiavashi commented May 18, 2024

leiwen83 commented May 19, 2024

Msiavashi commented May 20, 2024

KevinZeng08 commented Jun 25, 2024

kerthcet commented Jun 28, 2024

dshm commented Sep 19, 2024

[Usage]: Profiling Prefill and Decode Phases Separately #4900

[Usage]: Profiling Prefill and Decode Phases Separately #4900

Comments

Msiavashi commented May 18, 2024

Your current environment

How would you like to use vllm

leiwen83 commented May 19, 2024

Msiavashi commented May 20, 2024

KevinZeng08 commented Jun 25, 2024

kerthcet commented Jun 28, 2024

dshm commented Sep 19, 2024