Release MIMIC-IT, Otter-Image/Video released · Luodian/Otter

🧨 Download MIMIC-IT Dataset. For more details on navigating the dataset, please refer to MIMIC-IT Dataset README.
🏎️ Run Otter Locally. You can run our model locally with at least 16G GPU mem for tasks like image/video tagging and captioning and identifying harmful content. We fix a bug related to video inference where frame tensors were mistakenly unsqueezed to a wrong vision_x. You can now try running it again with the updated version.

Make sure to adjust the sys.path.append("../..") correctly to access otter.modeling_otter in order to launch the model.

Provide feedback