You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, I'm running vicuna 13B on H100 using fp8, and I find when batch size is large, say 64 or 96, the gpu utilization is low, about 60%, this is an important cause for the low performance.
I did some analysis, part of this is caused by the schedule and post process of requests.
Do you have any plans for improving this?
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
The output of `python collect_env.py`
The text was updated successfully, but these errors were encountered:
@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight?
The command is nothing special, I think it's only nsys profile python xxx.py. you can refer to nsys manual to see the usage.
Thanks for your quick reply. I see. It seems that you were using nsys to profile a single .py script. I thought that you were benchmarking the service.
Proposal to improve performance
Hi all, I'm running vicuna 13B on H100 using fp8, and I find when batch size is large, say 64 or 96, the gpu utilization is low, about 60%, this is an important cause for the low performance.
I did some analysis, part of this is caused by the schedule and post process of requests.
Do you have any plans for improving this?
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
The text was updated successfully, but these errors were encountered: