Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix Size Mismatch Error in 'make_captions.py' When Using --num_beams Not Equal to 1 #1149

Closed
INFCode opened this issue Mar 3, 2024 · 2 comments

Comments

@INFCode
Copy link

INFCode commented Mar 3, 2024

I am using the finetune/make_captions.py to create captions using BLIP. However, setting --num_beams larger than 1 results in mismatched matrix sizes.

Here's the error log:

> python3 "finetune/make_captions.py" --batch_size="1" --num_beams="5" --top_p="0.9" --max_length="75" --min_length="5" --beam_search \
                         --caption_extension=".txt" "/home/infcode/code/git/ai/dataset/hyh/all_original_caption/" \
                         --caption_weights="https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth"
2024-03-02 22:58:18.607746: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-02 22:58:18.607782: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-02 22:58:18.607818: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-02 22:58:18.613707: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-02 22:58:19.716504: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
get_preferred_device() -> cuda
2024-03-02 22:58:20 INFO     Current Working Directory is: /home/infcode/code/git/kohya_ss                                                                         make_captions.py:85
                    INFO     load images from /home/infcode/code/git/ai/dataset/hyh/all_original_caption                                                           make_captions.py:90
                    INFO     found 53 images.                                                                                                                      make_captions.py:93
                    INFO     loading BLIP caption: https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth                 make_captions.py:95
2024-03-02 22:58:29 INFO     load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth                          blip.py:242
                    INFO     BLIP loaded                                                                                                                           make_captions.py:99
  0%|                                                                                                                                                          | 0/53 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/infcode/code/git/kohya_ss/finetune/make_captions.py", line 210, in <module>
    main(args)
  File "/home/infcode/code/git/kohya_ss/finetune/make_captions.py", line 154, in main
    run_batch(b_imgs)
  File "/home/infcode/code/git/kohya_ss/finetune/make_captions.py", line 107, in run_batch
    captions = model.generate(
  File "/home/infcode/code/git/kohya_ss/finetune/blip/blip.py", line 162, in generate
    outputs = self.text_decoder.generate(input_ids=input_ids,
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1797, in generate
    return self.beam_search(
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 3181, in beam_search
    outputs = self(
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/infcode/code/git/kohya_ss/finetune/blip/med.py", line 886, in forward
    outputs = self.bert(
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/infcode/code/git/kohya_ss/finetune/blip/med.py", line 781, in forward
    encoder_outputs = self.encoder(
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/infcode/code/git/kohya_ss/finetune/blip/med.py", line 445, in forward
    layer_outputs = layer_module(
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/infcode/code/git/kohya_ss/finetune/blip/med.py", line 361, in forward
    cross_attention_outputs = self.crossattention(
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/infcode/code/git/kohya_ss/finetune/blip/med.py", line 277, in forward
    self_outputs = self.self(
  File "/home/infcode/code/git/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/infcode/code/git/kohya_ss/finetune/blip/med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (5) must match the size of tensor b (25) at non-singleton dimension 0

After trying different values for num_beams, it seems that the size of key_layer.transpose(-1, -2) is the square of query_layer at dimension 0, which explains why num_beams=1 does not result in any error.

@TeKett
Copy link

TeKett commented Mar 29, 2024

I got the exact same issue, im using the GUI version.

10:38:16-980477 INFO     Version: v22.3.1

10:38:16-987458 INFO     nVidia toolkit detected
10:38:18-560252 INFO     Torch 2.0.1+cu118
10:38:18-584160 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700
10:38:18-586155 INFO     Torch detected GPU: NVIDIA GeForce RTX 4070 Ti VRAM 12281 Arch (8, 9) Cores 60
10:38:18-588150 INFO     Verifying modules installation status from requirements_windows_torch2.txt...
10:38:18-593163 INFO     Verifying modules installation status from requirements.txt...
10:38:21-600523 INFO     headless: False
10:38:21-607532 INFO     Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
10:51:20-341524 INFO     Captioning files in D:\Pictures\New folder (3)...
10:51:20-342520 INFO     ./venv/Scripts/python.exe "finetune/make_captions.py" --batch_size="1" --num_beams="2"
                         --top_p="0.9" --max_length="75" --min_length="35" --beam_search --caption_extension=".txt"
                         "D:\Pictures\New folder (3)"
                         --caption_weights="https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/mode
                         l_large_caption.pth"
Current Working Directory is:  C:\Train\Kohya
load images from D:\Pictures\New folder (3)
found 2 images.
loading BLIP caption: https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
BLIP loaded
  0%|                                                                                            | 0/2 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "C:\Train\Kohya\finetune\make_captions.py", line 200, in <module>
    main(args)
  File "C:\Train\Kohya\finetune\make_captions.py", line 144, in main
    run_batch(b_imgs)
  File "C:\Train\Kohya\finetune\make_captions.py", line 97, in run_batch
    captions = model.generate(
  File "C:\Train\Kohya\finetune\blip\blip.py", line 158, in generate
    outputs = self.text_decoder.generate(input_ids=input_ids,
  File "C:\Train\Kohya\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Train\Kohya\venv\lib\site-packages\transformers\generation\utils.py", line 1611, in generate
    return self.beam_search(
  File "C:\Train\Kohya\venv\lib\site-packages\transformers\generation\utils.py", line 2909, in beam_search
    outputs = self(
  File "C:\Train\Kohya\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Train\Kohya\finetune\blip\med.py", line 886, in forward
    outputs = self.bert(
  File "C:\Train\Kohya\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Train\Kohya\finetune\blip\med.py", line 781, in forward
    encoder_outputs = self.encoder(
  File "C:\Train\Kohya\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Train\Kohya\finetune\blip\med.py", line 445, in forward
    layer_outputs = layer_module(
  File "C:\Train\Kohya\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Train\Kohya\finetune\blip\med.py", line 361, in forward
    cross_attention_outputs = self.crossattention(
  File "C:\Train\Kohya\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Train\Kohya\finetune\blip\med.py", line 277, in forward
    self_outputs = self.self(
  File "C:\Train\Kohya\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Train\Kohya\finetune\blip\med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 0

@kohya-ss
Copy link
Owner

I finally found the fix for this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants