MM-Vid: Advancing Video Understanding with GPT-4V(ision)

This repository contains the open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".

Overview

The goal of this project is to advance video understanding by leveraging the capabilities of GPT-4V(ision). The implementation follows the methodologies and experiments described in the paper, providing a comprehensive framework for scene detection, video clipping, speech recognition, and generating coherent video descriptions.

Installation

To use this repository, first clone the repository and install the required dependencies.

git clone https://github.com/yongliang-wu/MM-VID.git
cd MM-VID
pip install -r requirements.txt

Then run the code

python main.py

TODO

The input of external information is not supported yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MM-Vid: Advancing Video Understanding with GPT-4V(ision)

Overview

Installation

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

MM-Vid: Advancing Video Understanding with GPT-4V(ision)

Overview

Installation

TODO