[Bug]: Number of available GPU blocks drop significantly for Phi3-vision #6124

CatherineSue · 2024-07-04T02:17:08Z

Your current environment

Two docker containers based on images built from vllm source 3de6e6a and 3f3b6b2

🐛 Describe the bug

I passed the same model Phi-3-vision-128k-instruct to each docker container:

--tensor-parallel-size=1 \
--model=/models/Phi-3-vision-128k-instruct \

For the version needs VLMConfig, here are the parameters

--image-input-type="pixel_values" \
--image-feature-size=1921 \
--image-token-id=32044 \
--image-input-shape="1, 3, 1008, 1344"

And with the container based on 3de6e6a more latest, it raises error:

INFO 07-04 01:04:14 gpu_executor.py:84] # GPU blocks: 5970, # CPU blocks: 682
[rank0]: ValueError: The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (95520). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

But the container based on 3f3b6b2:

INFO 07-04 01:40:03 gpu_executor.py:83] # GPU blocks: 8825, # CPU blocks: 682
INFO 07-04 01:40:05 model_runner.py:906] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.

The text was updated successfully, but these errors were encountered:

CatherineSue · 2024-07-04T02:25:12Z

@ywang96 Can you share some insight? Does it have something to do with the recent changes in VLM support?

DarkLight1337 · 2024-07-04T02:56:55Z

There used to be a bug in the model's memory profiling where it didn't actually pass in images. During inference, this underestimation might have caused OOM.

After the fix, the available block count is reduced significantly which better reflects the true memory usage of the model. Re: your problem, this is expected as the model has 128k context length. If it can't fit in your GPU, try reducing the context length via max_model_len or the sequence count via max_num_seqs.

CatherineSue · 2024-07-04T03:06:24Z

thanks for the explanation @DarkLight1337 !

ywang96 · 2024-07-04T03:15:40Z

Just for future reference - the bug was discovered and fixed in #5888 and #5214.

We have also updated examples/phi3v_example.py. The current profiling strategy is rather conservative, but improving it is definitely part of the next milestone!

2U1 · 2024-07-08T06:14:03Z

@ywang96 I get same error using with max_num_seqs=1.

Is there some way to fix it?

ValueError: The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (4544). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

DarkLight1337 · 2024-07-08T06:20:16Z

As stated in the error message, you may have to decrease max_model_len (e.g. 64k instead of 128k)

2U1 · 2024-07-08T06:40:55Z

@DarkLight1337 Thanks decreasing the max_model_len solved the problem!

CatherineSue added the bug Something isn't working label Jul 4, 2024

CatherineSue closed this as completed Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Number of available GPU blocks drop significantly for Phi3-vision #6124

[Bug]: Number of available GPU blocks drop significantly for Phi3-vision #6124

CatherineSue commented Jul 4, 2024 •

edited

Loading

CatherineSue commented Jul 4, 2024

DarkLight1337 commented Jul 4, 2024 •

edited

Loading

CatherineSue commented Jul 4, 2024

ywang96 commented Jul 4, 2024 •

edited

Loading

2U1 commented Jul 8, 2024 •

edited

Loading

DarkLight1337 commented Jul 8, 2024

2U1 commented Jul 8, 2024

[Bug]: Number of available GPU blocks drop significantly for Phi3-vision #6124

[Bug]: Number of available GPU blocks drop significantly for Phi3-vision #6124

Comments

CatherineSue commented Jul 4, 2024 • edited Loading

Your current environment

🐛 Describe the bug

CatherineSue commented Jul 4, 2024

DarkLight1337 commented Jul 4, 2024 • edited Loading

CatherineSue commented Jul 4, 2024

ywang96 commented Jul 4, 2024 • edited Loading

2U1 commented Jul 8, 2024 • edited Loading

DarkLight1337 commented Jul 8, 2024

2U1 commented Jul 8, 2024

CatherineSue commented Jul 4, 2024 •

edited

Loading

DarkLight1337 commented Jul 4, 2024 •

edited

Loading

ywang96 commented Jul 4, 2024 •

edited

Loading

2U1 commented Jul 8, 2024 •

edited

Loading