Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava-hf/llava-1.5-7b-hf: error when multi-turn chat with multi-images #12288

Open
Johere opened this issue Oct 29, 2024 · 5 comments
Open

llava-hf/llava-1.5-7b-hf: error when multi-turn chat with multi-images #12288

Johere opened this issue Oct 29, 2024 · 5 comments
Assignees

Comments

@Johere
Copy link

Johere commented Oct 29, 2024

from ipex_llm import optimize_model
from transformers import LlavaForConditionalGeneration
model = LlavaForConditionalGeneration.from_pretrained('llava-hf/llava-1.5-7b-hf', device_map="cpu")
model = optimize_model(model, low_bit='sym_int4')
model = model.eval().to('xpu')

Multi-turn chat is like:

Error logs:

Traceback (most recent call last): 
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner 
    self.run() 
  File "/usr/lib/python3.10/threading.py", line 953, in run 
    self._target(*self._args, **self._kwargs) 
  File "/home/ipex-llm-serving/dependency/model_worker.py", line 85, in model_generate 
    raise NotImplementedError(f"Unsupported model: {self.model_name}, error: {error}") 
NotImplementedError: Unsupported model: llava-hf/llava-1.5-7b-hf, error: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. 

The error is located as:
/usr/lib/python3.10/site-packages/ipex_llm/transformers/low_bit_linear.py :729
x_2d = x.view(-1, x_shape[-1])

If I modify as: x_2d = x.contiguous().view(-1, x_shape[-1]), everything will be OK.
I think the issue is related to LLaVA model's vision_feature_select_strategy (vision_feature_select_strategy=default) which may make the tensor discontiguous.

Can anyone help on this issue? Thanks!

Python packages:
ipex-llm 2.2.0b20241011
transformers 4.45.2

@Oscilloscope98
Copy link
Contributor

Hi @Johere, we are reproducing this issue. We will update here for any progress :)

@Oscilloscope98 Oscilloscope98 assigned lzivan and unassigned lzivan Oct 30, 2024
@JinheTang
Copy link
Contributor

JinheTang commented Nov 1, 2024

Hi @Johere , we have updated our llava example for llava-hf/llava-1.5-7b-hf. Please follow the instructions in the latest llava example to see if it works.

If the issue continues, could you please share the scripts you're using to run the multi-turn chat, along with the output from our env-check scripts, to help us gather more details? :)

@Johere
Copy link
Author

Johere commented Nov 5, 2024

Hi @Johere , we have updated our llava example for llava-hf/llava-1.5-7b-hf. Please follow the instructions in the latest llava example to see if it works.

If the issue continues, could you please share the scripts you're using to run the multi-turn chat, along with the output from our env-check scripts, to help us gather more details? :)

Hi @JinheTang Thanks for your reply. The problem still exists. To reproduce the problem I met, please modify several lines of the latest llava example:

diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py b/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
index b70e22541a..c3b35ee2d8 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
@@ -56,8 +56,23 @@ if __name__ == '__main__':
                 {"type": "image"},
                 {"type": "text", "text": prompt}
             ]
+        },
+        # mimic a multi-round chat
+        {
+            'role': 'assistant',
+            'content': [
+                {'type': 'text', 'text': 'The image features a young girl holding a stuffed teddy bear.'}
+            ]
+        },
+        {
+            "role": "user",
+            "content": [
+                {"type": "image"},
+                {"type": "text", "text": "Describe the differences between these two images."}
+            ]
         }
     ]
+
     text = processor.apply_chat_template(messages, add_generation_prompt=True)

     if os.path.exists(image_path):
@@ -65,7 +80,10 @@ if __name__ == '__main__':
     else:
        image = Image.open(requests.get(image_path, stream=True).raw)

-    inputs = processor(text=text, images=image, return_tensors="pt").to('xpu')
+    # inputs = processor(text=text, images=image, return_tensors="pt").to('xpu')
+    # multi-image chat debug
+    image_2 = Image.open(requests.get("http://farm5.staticflickr.com/4031/4440753665_631134eaa4_z.jpg", stream=True).raw)
+    inputs = processor(text=text, images=[image, image_2], return_tensors="pt").to('xpu')

Env check output log is attached:
env-check.txt

@JinheTang
Copy link
Contributor

Hi @Johere , thanks for the script, we will try to reproduce it.

@JinheTang
Copy link
Contributor

Hi @Johere , we have reproduced the issue. If there's any update we will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants