Skip to content

Latest commit

 

History

History
62 lines (51 loc) · 2.56 KB

DATA.md

File metadata and controls

62 lines (51 loc) · 2.56 KB

Data directory structure:

Please organize the datasets as follows, otherwise you may need to revise the write_*.py files to meet your dataset path and files.

MM-IMDb

MM-IMDb (archive.org mirror)

root
├── images            
│   ├── 00000005.jpeg 
│   ├── 00000008.jpeg   
│   └── ...        
├── labels          
│   ├── 00000005.json 
│   ├── 00000008.json   
│   └── ...        
└── split.json 

Food101

UPMC Food-101 (Kaggle)

root
├── images            
│   ├── train                
│   │   ├── apple_pie
│   │   │   ├── apple_pie_0.jpg        
│   │   │   └── ...         
│   │   ├── baby_back_ribs  
│   │   │   ├── baby_back_ribs_0.jpg        
│   │   │   └── ...    
│   │   └── ...
│   ├── test                
│   │   ├── apple_pie
│   │   │   ├── apple_pie_0.jpg        
│   │   │   └── ...         
│   │   ├── baby_back_ribs  
│   │   │   ├── baby_back_ribs_0.jpg        
│   │   │   └── ...    
│   │   └── ...
├── texts          
│   ├── train_titles.csv            
│   └── test_titles.csv         
├── class_idx.json         
├── text.json         
└── split.json

Hateful Memes

Hateful Memes

Update: The datasets we used are from (Kaggle), however, the test.jsonl here does not contain the label information. To make the label available for the evaluation, we download the test_seen.jsonl from (Kaggle2), which has the same test set as the previous one with the label information. You can directly download the test_seen.jsonl file here and use it (of course you should rename it as test.jsonl or revise the write_*.py file).

root
├── img           
│   ├── xxxxx.png 
│   ├── xxxxx.png   
│   └── ...        
├── train.jsonl          
├── dev.jsonl           
└── test.jsonl