Video‐LLM Setup Instructions

PandaGPT

Follow the instructions on https://github.com/yxuansu/PandaGPT/tree/main/pretrained_ckpt#1-prepare-vicuna-checkpoint and https://github.com/yxuansu/PandaGPT?tab=readme-ov-file#2-running-pandagpt-demo-back-to-top to download the necessary weights. To use the provided config files, place the weights as follows:

Generate the Vicuna weights in the directories weights/vicuna/7b_v0 and weights/vicuna/13b_v0.
Place Imagebind weights in weights/imagebind.
PandaGPT delta weights should be placed in weights/pandagpt:

weights/
├── imagebind
│   └── imagebind_huge.pth
├── pandagpt
│   ├── pandagpt_13b_max_len_256
│   │   ├── README.md
│   │   └── pytorch_model.pt
│   ├── pandagpt_13b_max_len_400
│   │   ├── README.md
│   │   └── pytorch_model.pt
│   ├── pandagpt_7b_max_len_1024
│   │   ├── README.md
│   │   └── pytorch_model.pt
│   └── pandagpt_7b_max_len_512
│       ├── README.md
│       └── pytorch_model.pt
└── vicuna
    ├── 13_v0
    │   ├── config.json
    │   ├── generation_config.json
    │   ├── model-00001-of-00006.safetensors
    │   ├── model-00002-of-00006.safetensors
    │   ├── model-00003-of-00006.safetensors
    │   ├── model-00004-of-00006.safetensors
    │   ├── model-00005-of-00006.safetensors
    │   ├── model-00006-of-00006.safetensors
    │   ├── model.safetensors.index.json
    │   ├── special_tokens_map.json
    │   ├── tokenizer.model
    │   └── tokenizer_config.json
    └── 7b_v0
        ├── config.json
        ├── generation_config.json
        ├── model-00001-of-00003.safetensors
        ├── model-00002-of-00003.safetensors
        ├── model-00003-of-00003.safetensors
        ├── model.safetensors.index.json
        ├── special_tokens_map.json
        ├── tokenizer.model
        └── tokenizer_config.json

Place the PandaGPT repository in the models directory.

Video-LLaMA

Follow the instructions on https://github.com/DAMO-NLP-SG/Video-LLaMA/tree/3ff50e53aa64afe57d5d98277546e2865f121256 to generate and download the necessary weights. To use the provided config files, place the weights as follows:

Generate the Vicuna weights in the directories weights/vicuna/7b_v0 and weights/vicuna/13b_v0.
Place Imagebind weights in weights/imagebind.
Place MiniGPT-4 weights in weights/minigpt4 (These weights can be downloaded at https://huggingface.co/spaces/zylj/MiniGPT-4/tree/main).
Place the fine-tuned Vicuna weights in weights/videollama

weights/
├── imagebind
│   └── imagebind_huge.pth
├── minigpt4
│   ├── pretrained_minigpt4_13b.pth
│   └── pretrained_minigpt4_7b.pth
├── vicuna
│   ├── 13_v0
│   │   ├── config.json
│   │   ├── generation_config.json
│   │   ├── model-00001-of-00006.safetensors
│   │   ├── model-00002-of-00006.safetensors
│   │   ├── model-00003-of-00006.safetensors
│   │   ├── model-00004-of-00006.safetensors
│   │   ├── model-00005-of-00006.safetensors
│   │   ├── model-00006-of-00006.safetensors
│   │   ├── model.safetensors.index.json
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer.model
│   │   └── tokenizer_config.json
│   └── 7b_v0
│       ├── config.json
│       ├── generation_config.json
│       ├── model-00001-of-00003.safetensors
│       ├── model-00002-of-00003.safetensors
│       ├── model-00003-of-00003.safetensors
│       ├── model.safetensors.index.json
│       ├── special_tokens_map.json
│       ├── tokenizer.model
│       └── tokenizer_config.json
└── videollama
    ├── finetune-vicuna13b-v2.pth
    └── finetune-vicuna7b-v2.pth

Place the Video-LLaMA repository in the models directory.

Video-LLaMA 2

Follow the instructions on https://github.com/DAMO-NLP-SG/Video-LLaMA/tree/3ff50e53aa64afe57d5d98277546e2865f121256 to generate and download the necessary weights. To use the provided config files, place the weights as follows:

Generate the Vicuna weights in the directories weights/vicuna/7b_v0 and weights/vicuna/13b_v0.
Place Imagebind weights in weights/imagebind.
Place MiniGPT-4 weights in weights/minigpt4 (These weights can be downloaded at https://huggingface.co/spaces/zylj/MiniGPT-4/tree/main).
Place the fine-tuned Vicuna weights in weights/videollama

weights/
├── imagebind
│   └── imagebind_huge.pth
├── minigpt4
│   ├── pretrained_minigpt4_13b.pth
│   └── pretrained_minigpt4_7b.pth
├── vicuna
│   ├── 13_v0
│   │   ├── config.json
│   │   ├── generation_config.json
│   │   ├── model-00001-of-00006.safetensors
│   │   ├── model-00002-of-00006.safetensors
│   │   ├── model-00003-of-00006.safetensors
│   │   ├── model-00004-of-00006.safetensors
│   │   ├── model-00005-of-00006.safetensors
│   │   ├── model-00006-of-00006.safetensors
│   │   ├── model.safetensors.index.json
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer.model
│   │   └── tokenizer_config.json
│   └── 7b_v0
│       ├── config.json
│       ├── generation_config.json
│       ├── model-00001-of-00003.safetensors
│       ├── model-00002-of-00003.safetensors
│       ├── model-00003-of-00003.safetensors
│       ├── model.safetensors.index.json
│       ├── special_tokens_map.json
│       ├── tokenizer.model
│       └── tokenizer_config.json
└── videollama
    ├── finetune-vicuna13b-v2.pth
    └── finetune-vicuna7b-v2.pth

Place the Video-LLaMA repository in the models directory.

VTimeLLM

Follow the instructions on https://github.com/huangb23/VTimeLLM/blob/673312a8f7e18caec9af716cf7b44d3b70ccebfd/docs/offline_demo.md to download the necessary weights. To use the provided config files, place the weights as follows:

Place the clip weights in weights/clip.
Place the vicuna weights in weights/vicuna.
Place the VTimeLLM weights in weights/vtimellm

weights/
├── clip
│   └── ViT-L-14.pt
├── vicuna
│   └── vicuna-7b-v1.5
│       ├── config.json
│       ├── generation_config.json
│       ├── pytorch_model-00001-of-00002.bin
│       ├── pytorch_model-00002-of-00002.bin
│       ├── pytorch_model.bin.index.json
│       ├── README.md
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       └── tokenizer.model
└── vtimellm
    ├── vtimellm-vicuna-v1-5-7b-stage1
    │   ├── config.json
    │   ├── mm_projector.bin
    │   └── trainer_state.json
    ├── vtimellm-vicuna-v1-5-7b-stage2
    │   ├── adapter_config.json
    │   ├── adapter_model.bin
    │   ├── config.json
    │   ├── non_lora_trainables.bin
    │   ├── README.md
    │   └── trainer_state.json
    └── vtimellm-vicuna-v1-5-7b-stage3
        ├── adapter_config.json
        ├── adapter_model.bin
        ├── config.json
        ├── non_lora_trainables.bin
        ├── README.md
        └── trainer_state.json

Place the VTimeLLM repository in the models directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video‐LLM Setup Instructions

PandaGPT

Video-LLaMA

Video-LLaMA 2

VTimeLLM

Clone this wiki locally