Strangely, LanguageBind_Image preprocessor_config.json is missing while running demo #57

OPilgrim · 2023-12-25T09:14:49Z

Traceback (most recent call last):
File "/data/miniconda3/envs/bind/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/miniconda3/envs/bind/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/serve/tmp.py", line 57, in
main()
File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/serve/tmp.py", line 20, in main
tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name, load_8bit, load_4bit, device=device)
File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/model/builder.py", line 154, in load_pretrained_model
image_tower.load_model()
File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/model/multimodal_encoder/clip_encoder.py", line 23, in load_model
self.image_processor = CLIPImageProcessor.from_pretrained(self.vision_tower_name)
File "/data/miniconda3/envs/bind/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 165, in from_pretrained
image_processor_dict, kwargs = cls.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
File "/data/miniconda3/envs/bind/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 269, in get_image_processor_dict
resolved_image_processor_file = cached_file(
File "/data/miniconda3/envs/bind/lib/python3.10/site-packages/transformers/utils/hub.py", line 388, in cached_file
raise EnvironmentError(
OSError: LanguageBind/LanguageBind_Image does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/LanguageBind/LanguageBind_Image/None' for available files.

awzhgw · 2024-01-06T12:52:39Z

yes... i failed on this

awzhgw · 2024-01-06T12:57:41Z

@LinB203 ,can you resolve it ?

LinB203 · 2024-01-06T13:52:58Z

Sorry for the late reply. I have uploaded preprocessor_config.json. Feel free to let me know if this works.

OPilgrim · 2024-01-08T03:08:03Z

Thank you for the update, but now there is a new problem. It seems that the hidden dimension of mm_video_tower used by video-llava-7B does not match the hidden dimension of video-llava-7B. Are you sure that LanguageBind_Video_merge on hugging face is the correct version?

...
- This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

['Video', 'Image']
You are using a model of type LanguageBindImage to instantiate a model of type clip_vision_model. This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
  File "/data/miniconda3/envs/bind/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/miniconda3/envs/bind/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/serve/cli.py", line 144, in <module>
    main(args)
  File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/serve/cli.py", line 32, in main
    tokenizer, model, processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name,
  File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/model/builder.py", line 154, in load_pretrained_model
    image_tower.load_model()
  File "/data/Projects/FactCheck/MultiModel/LVLMs/Video-LLaVA/llava/model/multimodal_encoder/clip_encoder.py", line 24, in load_model
    self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name)
  File "/data/miniconda3/envs/bind/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/data/miniconda3/envs/bind/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3278, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for CLIPVisionModel:
        size mismatch for vision_model.embeddings.class_embedding: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for vision_model.embeddings.position_ids: copying a param with shape torch.Size([1, 257]) from checkpoint, the shape in current model is torch.Size([1, 50]).
        size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([1024, 3, 14, 14]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
        size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([257, 1024]) from checkpoint, the shape in current model is torch.Size([50, 768]).
        size mismatch for vision_model.pre_layrnorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for vision_model.pre_layrnorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.k_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.v_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.q_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for vision_model.encoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for vision_model.encoder.layers.0.layer_norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
...

LinB203 · 2024-01-08T03:13:13Z

Could you share your code?
The merge version is not recommended in our api.

OPilgrim · 2024-01-08T03:29:27Z

I just ran a test demo:
python -m llava.serve.cli --model-path "LanguageBind/Video-LLaVA-7B" --image-file "./assets/main.jpg" --load-4bit
Then change "mm_image_tower" and "mm_video_tower" in LanguageBind\Video-LLaVA-7B\config.json to the local address

well, I just ran the Inference for image code directly, and the same problem occurred

LinB203 · 2024-01-08T04:23:18Z

If you want to load based on local paths, make sure your local path is correct, here's a sample I loaded with my local path. It works fine.

Then remove some of the restrictions.

OPilgrim · 2024-01-08T06:42:15Z

Thank you very much. It works.

OPilgrim changed the title ~~Strangely enough, LanguageBind_Image preprocessor_config.json is missing while running demo~~ Strangely, LanguageBind_Image preprocessor_config.json is missing while running demo Dec 25, 2023

awzhgw mentioned this issue Jan 6, 2024

LanguageBind_Image preprocessor_config.json is missing while running demo #65

Closed

LinB203 closed this as completed Jan 8, 2024

LinB203 mentioned this issue Jan 16, 2024

Problems when loading local models；加载本地模型时的问题 #74

Closed

This was referenced Jan 30, 2024

Hi， is there a bug in Video-LLaVA-main/videollava/model/multimodal_encoder/builder.py? #89

Open

Offline load checkpoint error #93

Closed

jusepv mentioned this issue May 6, 2024

how to load pretrained weight on local (offline)? #150

Open

Liu98C mentioned this issue Jul 8, 2024

error:RuntimeError: Error(s) in loading state_dict for CLIPVisionModel: size mismatch for vision_model.embeddings.class_embedding: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strangely, LanguageBind_Image preprocessor_config.json is missing while running demo #57

Strangely, LanguageBind_Image preprocessor_config.json is missing while running demo #57

OPilgrim commented Dec 25, 2023

awzhgw commented Jan 6, 2024

awzhgw commented Jan 6, 2024

LinB203 commented Jan 6, 2024

OPilgrim commented Jan 8, 2024

LinB203 commented Jan 8, 2024

OPilgrim commented Jan 8, 2024 •

edited

Loading

LinB203 commented Jan 8, 2024

OPilgrim commented Jan 8, 2024

Strangely, LanguageBind_Image preprocessor_config.json is missing while running demo #57

Strangely, LanguageBind_Image preprocessor_config.json is missing while running demo #57

Comments

OPilgrim commented Dec 25, 2023

awzhgw commented Jan 6, 2024

awzhgw commented Jan 6, 2024

LinB203 commented Jan 6, 2024

OPilgrim commented Jan 8, 2024

LinB203 commented Jan 8, 2024

OPilgrim commented Jan 8, 2024 • edited Loading

LinB203 commented Jan 8, 2024

OPilgrim commented Jan 8, 2024

OPilgrim commented Jan 8, 2024 •

edited

Loading