llava-hf/llava-1.5-7b-hf: error when multi-turn chat with multi-images #12288

Johere · 2024-10-29T06:12:42Z

from ipex_llm import optimize_model
from transformers import LlavaForConditionalGeneration
model = LlavaForConditionalGeneration.from_pretrained('llava-hf/llava-1.5-7b-hf', device_map="cpu")
model = optimize_model(model, low_bit='sym_int4')
model = model.eval().to('xpu')

Multi-turn chat is like:

1st-round:
http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
What is this?
2nd-round:
http://farm5.staticflickr.com/4031/4440753665_631134eaa4_z.jpg
What are the differences between these two images?

Error logs:

Traceback (most recent call last): 
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner 
    self.run() 
  File "/usr/lib/python3.10/threading.py", line 953, in run 
    self._target(*self._args, **self._kwargs) 
  File "/home/ipex-llm-serving/dependency/model_worker.py", line 85, in model_generate 
    raise NotImplementedError(f"Unsupported model: {self.model_name}, error: {error}") 
NotImplementedError: Unsupported model: llava-hf/llava-1.5-7b-hf, error: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

The error is located as:
/usr/lib/python3.10/site-packages/ipex_llm/transformers/low_bit_linear.py :729
x_2d = x.view(-1, x_shape[-1])

If I modify as: x_2d = x.contiguous().view(-1, x_shape[-1]), everything will be OK.
I think the issue is related to LLaVA model's vision_feature_select_strategy (vision_feature_select_strategy=default) which may make the tensor discontiguous.

Can anyone help on this issue? Thanks!

Python packages:
ipex-llm 2.2.0b20241011
transformers 4.45.2

The text was updated successfully, but these errors were encountered:

Oscilloscope98 · 2024-10-30T02:06:21Z

Hi @Johere, we are reproducing this issue. We will update here for any progress :)

JinheTang · 2024-11-01T09:21:52Z

Hi @Johere , we have updated our llava example for llava-hf/llava-1.5-7b-hf. Please follow the instructions in the latest llava example to see if it works.

If the issue continues, could you please share the scripts you're using to run the multi-turn chat, along with the output from our env-check scripts, to help us gather more details? :)

Johere · 2024-11-05T06:24:28Z

Hi @Johere , we have updated our llava example for llava-hf/llava-1.5-7b-hf. Please follow the instructions in the latest llava example to see if it works.

If the issue continues, could you please share the scripts you're using to run the multi-turn chat, along with the output from our env-check scripts, to help us gather more details? :)

Hi @JinheTang Thanks for your reply. The problem still exists. To reproduce the problem I met, please modify several lines of the latest llava example:

diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py b/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
index b70e22541a..c3b35ee2d8 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
@@ -56,8 +56,23 @@ if __name__ == '__main__':
                 {"type": "image"},
                 {"type": "text", "text": prompt}
             ]
+        },
+        # mimic a multi-round chat
+        {
+            'role': 'assistant',
+            'content': [
+                {'type': 'text', 'text': 'The image features a young girl holding a stuffed teddy bear.'}
+            ]
+        },
+        {
+            "role": "user",
+            "content": [
+                {"type": "image"},
+                {"type": "text", "text": "Describe the differences between these two images."}
+            ]
         }
     ]
+
     text = processor.apply_chat_template(messages, add_generation_prompt=True)

     if os.path.exists(image_path):
@@ -65,7 +80,10 @@ if __name__ == '__main__':
     else:
        image = Image.open(requests.get(image_path, stream=True).raw)

-    inputs = processor(text=text, images=image, return_tensors="pt").to('xpu')
+    # inputs = processor(text=text, images=image, return_tensors="pt").to('xpu')
+    # multi-image chat debug
+    image_2 = Image.open(requests.get("http://farm5.staticflickr.com/4031/4440753665_631134eaa4_z.jpg", stream=True).raw)
+    inputs = processor(text=text, images=[image, image_2], return_tensors="pt").to('xpu')

Env check output log is attached:
env-check.txt

JinheTang · 2024-11-07T02:01:19Z

Hi @Johere , thanks for the script, we will try to reproduce it.

JinheTang · 2024-11-08T01:54:57Z

Hi @Johere , we have reproduced the issue. If there's any update we will let you know.

Oscilloscope98 assigned lzivan and unassigned lzivan Oct 30, 2024

Oscilloscope98 assigned JinheTang Nov 1, 2024

glorysdj assigned JinBridger Nov 6, 2024

Oscilloscope98 unassigned JinBridger Nov 7, 2024

jason-dai added the user issue label Nov 7, 2024

Oscilloscope98 mentioned this issue Nov 12, 2024

Fix llava with multi-image inputs #12384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llava-hf/llava-1.5-7b-hf: error when multi-turn chat with multi-images #12288

llava-hf/llava-1.5-7b-hf: error when multi-turn chat with multi-images #12288

Johere commented Oct 29, 2024 •

edited

Loading

Oscilloscope98 commented Oct 30, 2024

JinheTang commented Nov 1, 2024 •

edited by Oscilloscope98

Loading

Johere commented Nov 5, 2024 •

edited

Loading

JinheTang commented Nov 7, 2024

JinheTang commented Nov 8, 2024

llava-hf/llava-1.5-7b-hf: error when multi-turn chat with multi-images #12288

llava-hf/llava-1.5-7b-hf: error when multi-turn chat with multi-images #12288

Comments

Johere commented Oct 29, 2024 • edited Loading

Oscilloscope98 commented Oct 30, 2024

JinheTang commented Nov 1, 2024 • edited by Oscilloscope98 Loading

Johere commented Nov 5, 2024 • edited Loading

JinheTang commented Nov 7, 2024

JinheTang commented Nov 8, 2024

Johere commented Oct 29, 2024 •

edited

Loading

JinheTang commented Nov 1, 2024 •

edited by Oscilloscope98

Loading

Johere commented Nov 5, 2024 •

edited

Loading