shape error #7

Alen334 · 2023-12-13T03:31:16Z

Hi, When I run the demo_maxvqa.py for a test, something is wrong with the shape:

Traceback (most recent call last):
File "E:/MaxVQA-master/demo_maxvqa.py", line 167, in
a = inference(video)
File "E:/MaxVQA-master/demo_maxvqa.py", line 160, in inference
vis_feats = visual_encoder(data["aesthetic"].to(device), data["technical"].to(device))
File "D:\tools\Anaconda\set\envs\python37tf\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\MaxVQA-master\model\visual.py", line 19, in forward
clip_feats = clip_feats[1:].reshape(7,7,-1,1024).permute(3,2,0,1)
RuntimeError: shape '[7, 7, -1, 1024]' is invalid for input of size 64512

vis_feats = visual_encoder(data["aesthetic"].to(device), data["technical"].to(device))
data["aesthetic"]---[3, 64, 224, 224]
data["technical"]---[3, 128, 224, 224]

The specific problem is found in the following two lines of code
clip_feats = self.clip_visual(x_aes)
clip_feats = clip_feats[1:].reshape(7,7,-1,1024).permute(3,2,0,1)
However, the shape of clip_feats is [64, 1024]

tom-bbc · 2024-02-20T16:37:14Z

+1 Did you end up find a way of resolving this issue and running the demo?

BehnooshParsa · 2024-03-02T01:31:53Z

Follow the instructions the repo gives for open_clip installation and you should not get this error. If you use pip install open-clip-torch you would get this error. They modified OpenClip.

tom-bbc · 2024-03-04T13:34:54Z

For reference, yes the installation of open_clip was the issue, but it was caused by the sed command used in the installation not working exactly as described by the documentation on macOS. Changing the command to
sed -i "" "92s/return x\[0\]/return x/" src/open_clip/modified_resnet.py
made this work for me (or you can just manually edit line 92 of modified_resnet.py to remove the square brackets).

narutothet · 2024-03-07T14:08:38Z

For reference, yes the installation of open_clip was the issue, but it was caused by the sed command used in the installation not working exactly as described by the documentation on macOS. Changing the command to sed -i "" "92s/return x\[0\]/return x/" src/open_clip/modified_resnet.py made this work for me (or you can just manually edit line 92 of modified_resnet.py to remove the square brackets).

Hello, why did I remove the square brackets on line 92 and still report shape error?
git clone https://github.com/mlfoundations/open_clip.git
cd open_clip
sed - i'92s/return x \ [0 ]/return x/'src/open_clip/modified_resnet.py
pip install - e.
Did you follow the steps in the README to install open_clip? Is there any other solution to this shape error problem?

tom-bbc · 2024-03-08T14:22:22Z

Not sure if this is just a pasting error in your question but if you're using the above command exactly as you've written there the sed command is written incorrectly, you either want:

sed -i "92s/return x\[0\]/return x/" src/open_clip/modified_resnet.py (original)
or
sed -i "" "92s/return x\[0\]/return x/" src/open_clip/modified_resnet.py (what I laid out above for macos)
then
pip install -e .

BubbleYu-ZJU · 2024-07-04T06:45:15Z

Traceback (most recent call last):
File "demo_maxvqa.py", line 105, in
maxvqa = MaxVQA(text_tokens, embedding, text_encoder, share_ctx=True).cuda()
File "/ExplainableVQA-master/model/maxvqa.py", line 71, in init
self.text_feats = text_encoder(n_prompts.cuda(), self.tokenized_prompts)
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/ExplainableVQA-master/model/maxvqa.py", line 28, in forward
x = self.transformer(x, attn_mask=self.attn_mask)
File "miniconda3/envs/tf_torch_btx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "ExplainableVQA/open_clip/src/open_clip/transformer.py", line 363, in forward
x = r(x, attn_mask=attn_mask)
File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "ExplainableVQA/open_clip/src/open_clip/transformer.py", line 263, in forward
x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
File "ExplainableVQA/open_clip/src/open_clip/transformer.py", line 250, in attention
return self.attn(
File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1205, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File lib/python3.8/site-packages/torch/nn/functional.py", line 5251, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.")
RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (32, 32).

I got this error, do u know why?

Radovid-Lab · 2024-07-16T11:57:57Z

Traceback (most recent call last):

File "demo_maxvqa.py", line 105, in
maxvqa = MaxVQA(text_tokens, embedding, text_encoder, share_ctx=True).cuda()
File "/ExplainableVQA-master/model/maxvqa.py", line 71, in init
self.text_feats = text_encoder(n_prompts.cuda(), self.tokenized_prompts)
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/ExplainableVQA-master/model/maxvqa.py", line 28, in forward
x = self.transformer(x, attn_mask=self.attn_mask)
File "miniconda3/envs/tf_torch_btx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "ExplainableVQA/open_clip/src/open_clip/transformer.py", line 363, in forward
x = r(x, attn_mask=attn_mask)
File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "ExplainableVQA/open_clip/src/open_clip/transformer.py", line 263, in forward
x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
File "ExplainableVQA/open_clip/src/open_clip/transformer.py", line 250, in attention
return self.attn(
File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1205, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File lib/python3.8/site-packages/torch/nn/functional.py", line 5251, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.")
RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (32, 32).

I got this error, do u know why?

I reckon it could be related to the "batch_first" argument in a relatively newer version of torch. You can try to remove the two "permute" operations in TextEncoder's forward function at model/maxvqa.py:27 and 29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shape error #7

shape error #7

Alen334 commented Dec 13, 2023

tom-bbc commented Feb 20, 2024

BehnooshParsa commented Mar 2, 2024

tom-bbc commented Mar 4, 2024

narutothet commented Mar 7, 2024

tom-bbc commented Mar 8, 2024

BubbleYu-ZJU commented Jul 4, 2024

Radovid-Lab commented Jul 16, 2024

shape error #7

shape error #7

Comments

Alen334 commented Dec 13, 2023

tom-bbc commented Feb 20, 2024

BehnooshParsa commented Mar 2, 2024

tom-bbc commented Mar 4, 2024

narutothet commented Mar 7, 2024

tom-bbc commented Mar 8, 2024

BubbleYu-ZJU commented Jul 4, 2024

Radovid-Lab commented Jul 16, 2024