Why is my batch size maxing out at 20 when my estimated max is around 60? #1331

ColtonBehannon · 2023-12-11T23:19:58Z

ColtonBehannon
Dec 11, 2023

I am running text-generation-benchmark with the default batch sizes of [1, 2, 4, 8, 16, and 32] but getting CUDA OOM errors when I hit the batch size of 32. I refined it down hitting a max at batch size 20. This is confusing to me as I'm running HuggingFaceH4/zephyr-7b-beta on an A100 80GB PCIe. From my rough calculations this should be able to handle a max batch size of around 60. I am running text-generation-launcher with the default parameters and running text-generation-benchmark with --decode-length 2048 and --sequence-length 256. Here is my calculation for reference:

KV cache size = 2 * 2 * 32 * 4096 / 1000000000 = 0.000524
KV cache tokens = (80 - 14) * 0.000524 = 125885.01
Max batch size = 125885.01 / 2048 = 61.47

Am I doing the calculation wrong here or could there be something amiss with my setup?

beman9 · 2023-12-20T14:27:09Z

beman9
Dec 20, 2023

I’m very interested in this. Of course if you take into account different attention mechanisms and other optimizations inference can be better than this calc but that shouldn’t be relevant here as the calculated max batch size as a base is much larger than it is able to achieve. Any ideas? @OlivierDehaene

1 reply

KivancBiber Dec 20, 2023

I am interested in this as well ;)

martinigoyanes · 2024-05-15T10:02:46Z

martinigoyanes
May 15, 2024

This is related to my issue: #1831

For my POV, I also think there is something odd. Not only when you do the math it leads you to different numbers but aswell when you consider the inferred MAX_BATCH_TOTAL_TOKENS that TGI computes during warmup then you see that it is impossible to hit that while using the benchmarking tool.

Some useful links I have used:
https://kipp.ly/transformer-inference-arithmetic/#latency-calculations
https://www.jinghong-chen.net/estimate-vram-usage-in-llm-inference/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is my batch size maxing out at 20 when my estimated max is around 60? #1331

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Why is my batch size maxing out at 20 when my estimated max is around 60? #1331

ColtonBehannon Dec 11, 2023

Replies: 2 comments · 1 reply

beman9 Dec 20, 2023

KivancBiber Dec 20, 2023

martinigoyanes May 15, 2024

ColtonBehannon
Dec 11, 2023

Replies: 2 comments 1 reply

beman9
Dec 20, 2023

martinigoyanes
May 15, 2024