Qwen2-VL 微调不支持同时输入video和image么 #5822

zhang122994917 · 2024-10-25T02:50:46Z

Reminder

I have read the README and searched the existing issues.

System Info

transformers==4.45.1

Reproduction

输入同时包含video和image在进行tokenizer时具体报错如下

Converting format of dataset (num_proc=128): 100%|_________________________________________________________________________| 49996/49996 [00:02<00:00, 20519.51 examples/s]
Running tokenizer on dataset (num_proc=128): 0%| | 0/49996 [02:06<?, ? examples/s]

[rank0]: result = (True, func(*args, **kwds))
[rank0]: File "/usr/local/lib/python3.8/dist-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
[rank0]: for i, result in enumerate(func(**kwargs)):
[rank0]: File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3558, in _map_single
[rank0]: batch = apply_function_on_filtered_inputs(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3427, in apply_function_on_filtered_inputs
[rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]: File "./LLaMA-Factory/src/llamafactory/data/processors/supervised.py", line 105, in preprocess_supervised_dataset
[rank0]: input_ids, labels = _encode_supervised_example(
[rank0]: File "./LLaMA-Factory/src/llamafactory/data/processors/supervised.py", line 48, in _encode_supervised_example
[rank0]: messages = template.mm_plugin.process_messages(prompt + response, images, videos, processor)
[rank0]: File " ./LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 496, in process_messages
[rank0]: raise ValueError("len(images) is less than the number of {} tokens.".format(IMAGE_PLACEHOLDER))
[rank0]: ValueError: len(images) is less than the number of tokens.

具体数据格式如下，message中包含和

    {
        "messages": [
            {
                "content": "<video><image>\n以上是一个xxx",
                "role": "user"
            },
            {
                "content": "否",
                "role": "assistant"
            }
        ],
        "images": [
            "xxxx"
        ],
        "videos": [
            "xxxx"
        ]
    },

按照如下代码修改后能跑通，但是tokenizer特别慢还没查清楚原因

       if image_processor != video_processor:
            if input_dict.get("images") is not None:
                mm_inputs.update(image_processor(input_dict["images"], return_tensors="pt"))
            if input_dict.get("videos") is not None:
                mm_inputs.update(video_processor(input_dict["videos"], return_tensors="pt"))
        elif input_dict.get("images") is not None or input_dict.get("videos") is not None:  # same processor (qwen2-vl)
           
            # print(f"input_dict is. ========== {input_dict}")         
            #mm_inputs.update(image_processor(**input_dict, return_tensors="pt"))

            images = input_dict.get("images")
            videos = input_dict.get("videos")

            if images is not None:
                image_inputs = image_processor(images=images, videos=None, return_tensors="pt")
                image_grid_thw = image_inputs["image_grid_thw"]
            else:
                image_inputs = {}
                image_grid_thw = None

            if videos is not None:
                videos_inputs = image_processor(images=None, videos=videos, return_tensors="pt")
                video_grid_thw = videos_inputs["video_grid_thw"]
            else:
                videos_inputs = {}
                video_grid_thw = None
            mm_inputs['image_grid_thw'] = image_grid_thw
            mm_inputs['video_grid_thw'] = video_grid_thw

        return mm_inputs

Expected behavior

No response

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2-VL 微调不支持同时输入video和image么 #5822

Qwen2-VL 微调不支持同时输入video和image么 #5822

zhang122994917 commented Oct 25, 2024

Qwen2-VL 微调不支持同时输入video和image么 #5822

Qwen2-VL 微调不支持同时输入video和image么 #5822

Comments

zhang122994917 commented Oct 25, 2024

Reminder

System Info

Reproduction

输入同时包含video和image在进行tokenizer时具体报错如下

具体数据格式如下，message中包含和

按照如下代码修改后能跑通，但是tokenizer特别慢还没查清楚原因

Expected behavior

Others