Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why group_by_modality is set to False on the custom_finetune.sh script? #89

Open
ggcr opened this issue Jun 23, 2024 · 3 comments
Open

Comments

@ggcr
Copy link

ggcr commented Jun 23, 2024

Hi,

I am exploring on doing different experiments, by training the projector only, and I've seen that the default provided script for custom finetuning has the group_by_modality_length flag set to False.

DATA_PATH="/home/ai/data/llava/dataset/text_files/llava_v1_5_mix665k.json"
IMAGE_PATH="/home/ai/data/llava/dataset"
MODEL_MAX_LENGTH=3072
OUTPUT_DIR="/mnt/data/sata/yinghu/checkpoints/llava_factory/custom-finetune-TinyLLaVA-Phi-2-SigLIP-3.1B-lora"
deepspeed --include localhost:0,1,2,3 --master_port 29501 tinyllava/train/custom_finetune.py \
--deepspeed ./scripts/zero2.json \
--data_path $DATA_PATH \
--image_folder $IMAGE_PATH \
--is_multimodal True \
--conv_version phi \
--mm_vision_select_layer -2 \
--image_aspect_ratio square \
--fp16 True \
--training_recipe lora \
--tune_type_llm lora \
--tune_type_vision_tower frozen \
--tune_vision_tower_from_layer 0 \
--tune_type_connector full \
--lora_r 128 \
--lora_alpha 256 \
--group_by_modality_length False \
--pretrained_model_path "tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B" \
--output_dir $OUTPUT_DIR \

Because it has the llava_v1_5_mix665k.json as Data Path, it is not a uni-modal dataset, as it contains conversations with the <image> mm token.

Can you clarify when we should set the group_by_modality_length flag to True?

I am running in errors that are due to this:

E.g.

File "/root/TinyLLaVA_Factory/tinyllava/train/tinyllava_trainer.py", line 47, in get_modality_length_grouped_indices
    lang_indices, lang_lengths = zip(*[(i, -l) for i, l in enumerate(lengths) if l < 0])
ValueError: not enough values to unpack (expected 2, got 0)

Thanks in advance.

@ggcr ggcr closed this as completed Jun 24, 2024
@ggcr ggcr reopened this Jun 25, 2024
@ggcr
Copy link
Author

ggcr commented Jun 25, 2024

There is an inconsistency, in some (full) finetuning scripts it is set to False, and on others True.

@ggcr
Copy link
Author

ggcr commented Jun 25, 2024

I have a dataset of pure multi-modal data, that is, each question ("from": "gpt") contains the <image> label. Hence, this error should not pop up.

LLaVA author addresses it in this issue.

I created a PR with the same fix that the main LLaVA repo currently is using.

@ZhangXJ199
Copy link
Collaborator

Thanks for your question, group_by_modality_length needs to be set according to the data type: for data that only contains multimodal QA, group_by_modality_length needs to be set to False; for data that contains both multimodal QA and pure text QA (such as SQA), group_by_modality_length can be set to True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants