-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage]: Profiling Prefill and Decode Phases Separately #4900
Comments
stream mode shall get each token's latency, and thus prefill and decode phase could be measured.
|
So, there is still no embedded mechanism for these measurements/profiling, right? |
Hi, now do you have some other ways for profiling prefill and decode phase separately? |
There's a ongoing PR related. #2809 |
same question. now do we have some other ways for profiling prefill and decode phase separately? |
Your current environment
I'm attempting to independently measure the performance (e.g., latency, throughput, etc.) of the prefill and decode phases. Is there a way to achieve this? I have noticed a few benchmarks that measure end-to-end throughput and latency but do not provide separate metrics for each phase.
I would greatly appreciate any guidance on profiling these two phases separately.
How would you like to use vllm
No response
The text was updated successfully, but these errors were encountered: