Releases: Luodian/Otter
Releases · Luodian/Otter
OtterHD's release, Dataloading Process Refactored.
[2023-11]: Supporting GPT4V's Evaluation on 8 Benchmarks; Anouncing OtterHD-8B, improved from Fuyu-8B. Checkout OtterHD for details.
- 🦦 Added OtterHD, a multimodal fine-tuned from Fuyu-8B to facilitate fine-grained interpretations of high-resolution visual input without a explicit vision encoder module. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with Flash-Attention-2. Try our finetune script at OtterHD.
- 🔍 Added MagnifierBench, an evaluation benchmark tailored to assess whether the model can identify the tiny objects' information (1% image size) and spatial relationships.
- Improved pipeline for Pretrain | SFT | RLHF with (part of) current leading LMMs.
- Models: Otter | OpenFlamingo | Idefics | Fuyu
- Training Datasets Interface: (Pretrain) MMC4 | LAION2B | CC3M | CC12M, (SFT) MIMIC-IT | M3IT | LLAVAR | LRV | SVIT...
- We tested above datasets for both pretraining and instruction tuning with OpenFlamingo and Otter. We also tested the datasets with Idefics and Fuyu for instruction tuning. We will opensource the training scripts gradually.
- Benchmark Interface: MagnifierBench/MMBench/MM-VET/MathVista/POPE/MME/SicenceQA/SeedBench. Run them can be in one-click, please see Benchmark for details.
datasets: - name: magnifierbench split: test prompt: Answer with the option's letter from the given choices directly. api_key: [Your API Key] # GPT4 or GPT3.5 to evaluate the answers and ground truth. debug: true # put debug=true will save the model response in log file. - name: mme split: test debug: true - name: mmbench split: test debug: true models: - name: gpt4v api_key: [Your API Key] # to call GPT4V model.
- Code refactorization for organizing multiple groups of datasets with integrated yaml file, see details at managing datasets in MIMIC-IT format. For example,
This is a major change and would result previous code not runnable, please check the details.IMAGE_TEXT: # Group name should be in [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT] LADD: # Dataset name can be assigned at any name you want mimicit_path: azure_storage/json/LA/LADD_instructions.json # Path of the instruction json file images_path: azure_storage/Parquets/LA.parquet # Path of the image parquet file num_samples: -1 # Number of samples you want to use, -1 means use all samples, if not set, default is -1. M3IT_CAPTIONING: mimicit_path: azure_storage/json/M3IT/captioning/coco/coco_instructions.json images_path: azure_storage/Parquets/coco.parquet num_samples: 20000
MIMIC-IT, Otter-Image/Video released
-
🧨 Download MIMIC-IT Dataset. For more details on navigating the dataset, please refer to MIMIC-IT Dataset README.
-
🏎️ Run Otter Locally. You can run our model locally with at least 16G GPU mem for tasks like image/video tagging and captioning and identifying harmful content. We fix a bug related to video inference where frame tensors were mistakenly unsqueezed to a wrong vision_x. You can now try running it again with the updated version.
Make sure to adjust the sys.path.append("../..") correctly to access otter.modeling_otter in order to launch the model.
v0.1.0 - Initial Release
We are excited to announce the initial release of 🦦 Otter!