Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: cannot import name 'LlavaOnevisionForConditionalGeneration' from 'transformers' #1878

Closed
Lopa07 opened this issue Aug 31, 2024 · 3 comments

Comments

@Lopa07
Copy link

Lopa07 commented Aug 31, 2024

Describe the bug
I am trying to SFT fine-tune the model llava-onevision-qwen2-0_5b-ov using the following command:

swift sft \
    --model_type llava-onevision-qwen2-0_5b-ov \
    --dataset rlaif-v#1000 \
    --dataset_test_ratio 0.1 \
    --num_train_epochs 5 \
    --output_dir output

I received the error, ImportError: cannot import name 'LlavaOnevisionForConditionalGeneration' from 'transformers'. The error message along with the complete stack is posted below.

(swift) m.banerjee@PHYVDGPU02PRMV:/VDIL_COREML/m.banerjee/ms-swift$ CUDA_VISIBLE_DEVICES=0,1,2,3,5 \
swift sft \
    --model_type llava-onevision-qwen2-0_5b-ov \
    --dataset rlaif-v#1000 \
    --dataset_test_ratio 0.1 \
    --num_train_epochs 5 \
    --output_dir output

run sh: `/VDIL_COREML/m.banerjee/anaconda3/envs/swift/bin/python /VDIL_COREML/m.banerjee/ms-swift/swift/cli/sft.py --model_type llava-onevision-qwen2-0_5b-ov --dataset rlaif-v#1000 --dataset_test_ratio 0.1 --num_train_epochs 5 --output_dir output`
[INFO:swift] Successfully registered `/VDIL_COREML/m.banerjee/ms-swift/swift/llm/data/dataset_info.json`
[INFO:swift] No vLLM installed, if you are using vLLM, you will get `ImportError: cannot import name 'get_vllm_engine' from 'swift.llm'`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
[INFO:swift] Start time of running main: 2024-08-31 01:32:33.881287
[INFO:swift] Setting template_type: llava-onevision-qwen
[INFO:swift] Setting args.lazy_tokenize: True
[INFO:swift] Setting args.dataloader_num_workers: 1
[INFO:swift] output_dir: /VDIL_COREML/m.banerjee/ms-swift/output/llava-onevision-qwen2-0_5b-ov/v0-20240831-013234
[INFO:swift] args: SftArguments(model_type='llava-onevision-qwen2-0_5b-ov', model_id_or_path='AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf', model_revision='master', full_determinism=False, sft_type='lora', freeze_parameters=0.0, additional_trainable_parameters=[], tuner_backend='peft', template_type='llava-onevision-qwen', output_dir='/VDIL_COREML/m.banerjee/ms-swift/output/llava-onevision-qwen2-0_5b-ov/v0-20240831-013234', add_output_dir_suffix=True, ddp_backend=None, ddp_find_unused_parameters=None, ddp_broadcast_buffers=None, seed=42, resume_from_checkpoint=None, resume_only_model=False, ignore_data_skip=False, dtype='bf16', packing=False, train_backend='transformers', tp=1, pp=1, min_lr=None, sequence_parallel=False, dataset=['rlaif-v#1000'], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.1, use_loss_scale=False, loss_scale_config_path='/VDIL_COREML/m.banerjee/ms-swift/swift/llm/agent/default_loss_scale_config.json', system=None, tools_prompt='react_en', max_length=2048, truncation_strategy='delete', check_dataset_strategy='none', streaming=False, streaming_val_size=0, streaming_buffer_size=16384, model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, rescale_image=-1, target_modules='^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*', target_regex=None, modules_to_save=[], lora_rank=8, lora_alpha=32, lora_dropout=0.05, lora_bias_trainable='none', lora_dtype='AUTO', lora_lr_ratio=None, use_rslora=False, use_dora=False, init_lora_weights='true', fourier_n_frequency=2000, fourier_scaling=300.0, rope_scaling=None, boft_block_size=4, boft_block_num=0, boft_n_butterfly_factor=1, boft_dropout=0.0, vera_rank=256, vera_projection_prng_key=0, vera_dropout=0.0, vera_d_initial=0.1, adapter_act='gelu', adapter_length=128, use_galore=False, galore_target_modules=None, galore_rank=128, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_optim_per_parameter=False, galore_with_embedding=False, galore_quantization=False, galore_proj_quant=False, galore_proj_bits=4, galore_proj_group_size=256, galore_cos_threshold=0.4, galore_gamma_proj=2, galore_queue_size=5, adalora_target_r=8, adalora_init_r=12, adalora_tinit=0, adalora_tfinal=0, adalora_deltaT=1, adalora_beta1=0.85, adalora_beta2=0.85, adalora_orth_reg_weight=0.5, ia3_feedforward_modules=[], llamapro_num_new_blocks=4, llamapro_num_groups=None, neftune_noise_alpha=None, neftune_backend='transformers', lisa_activated_layers=0, lisa_step_interval=20, reft_layers=None, reft_rank=4, reft_intervention_type='LoreftIntervention', reft_args=None, gradient_checkpointing=True, deepspeed=None, batch_size=1, eval_batch_size=1, auto_find_batch_size=False, num_train_epochs=5, max_steps=-1, optim='adamw_torch', adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, learning_rate=0.0001, weight_decay=0.1, gradient_accumulation_steps=16, max_grad_norm=1, predict_with_generate=False, lr_scheduler_type='cosine', lr_scheduler_kwargs={}, warmup_ratio=0.05, warmup_steps=0, eval_steps=50, save_steps=50, save_only_model=False, save_total_limit=2, logging_steps=5, acc_steps=1, dataloader_num_workers=1, dataloader_pin_memory=True, dataloader_drop_last=False, push_to_hub=False, hub_model_id=None, hub_token=None, hub_private_repo=False, push_hub_strategy='push_best', test_oom_error=False, disable_tqdm=False, lazy_tokenize=True, preprocess_num_proc=1, use_flash_attn=None, ignore_args_error=False, check_model_is_latest=True, logging_dir='/VDIL_COREML/m.banerjee/ms-swift/output/llava-onevision-qwen2-0_5b-ov/v0-20240831-013234/runs', report_to=['tensorboard'], acc_strategy='token', save_on_each_node=False, evaluation_strategy='steps', save_strategy='steps', save_safetensors=True, gpu_memory_fraction=None, include_num_input_tokens_seen=False, local_repo_path=None, custom_register_path=None, custom_dataset_info=None, device_map_config_path=None, device_max_memory=[], max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, fsdp='', fsdp_config=None, sequence_parallel_size=1, model_layer_cls_name=None, metric_warmup_step=0, fsdp_num=1, per_device_train_batch_size=None, per_device_eval_batch_size=None, eval_strategy=None, self_cognition_sample=0, train_dataset_mix_ratio=0.0, train_dataset_mix_ds=['ms-bench'], train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, only_save_model=None, neftune_alpha=None, deepspeed_config_path=None, model_cache_dir=None, lora_dropout_p=None, lora_target_modules=[], lora_target_regex=None, lora_modules_to_save=[], boft_target_modules=[], boft_modules_to_save=[], vera_target_modules=[], vera_modules_to_save=[], ia3_target_modules=[], ia3_modules_to_save=[], custom_train_dataset_path=[], custom_val_dataset_path=[])
[INFO:swift] Global seed set to 42
device_count: 5
rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Downloading the model from ModelScope Hub, model_id: AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /VDIL_COREML/m.banerjee/.cache/modelscope/hub/AI-ModelScope/llava-onevision-qwen2-0___5b-ov-hf
Traceback (most recent call last):
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/sft.py", line 5, in <module>
    sft_main()
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/sft.py", line 215, in llm_sft
    model, tokenizer = get_model_tokenizer(
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/utils/model.py", line 6347, in get_model_tokenizer
    model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/utils/model.py", line 5889, in get_model_tokenizer_llava_onevision
    from transformers import LlavaOnevisionForConditionalGeneration
ImportError: cannot import name 'LlavaOnevisionForConditionalGeneration' from 'transformers' (/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/__init__.py)
(swift) m.banerjee@PHYVDGPU02PRMV:/VDIL_COREML/m.banerjee/ms-swift$ 

Your hardware and system info
CUDA Version: 12.4
System: Ubuntu 22.04.3 LTS
GPU
torch==2.4.0
transformers==4.45.0.dev0
trl==0.9.6
peft==0.12.0

@Jintao-Huang
Copy link
Collaborator

Please pay attention to this PR: huggingface/transformers#32673

@Lopa07
Copy link
Author

Lopa07 commented Sep 1, 2024

Thank you @Jintao-Huang! I am waiting till the branch zucchini-nlp:llava-onevision for the ^ PR is merged. Then I can test and close this issue.

@Lopa07
Copy link
Author

Lopa07 commented Sep 7, 2024

The PR has merged.

@Lopa07 Lopa07 closed this as completed Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants