This is the code for "Prompt Tuning for Generative Multimodal Pretrained Models", Check our paper on ArXiv. This paper explores prompt tuning for generative multimodal pretrained models, instead of the constrastive learning models. We specifically focuses on the unified sequence-to-sequence learning framework and implement on our OFA models.
- python 3.7.4
- pytorch 1.8.1
- torchvision 0.9.1
- JAVA 1.8 (for COCO evaluation)
pip install -r requirements.txt
See datasets.md and checkpoints.md.
We provide a demo script (run_scripts/refcoco/train_refcoco_prefix.sh
) that has all the required parts for training.
sh ./run_scripts/refcoco/train_refcoco_prefix.sh
A few options of note:
--encoder-prompt
:: whether to insert prompts to the encoder--decoder-prompt
:: whether to insert prompts to the decoder--encoder-prompt-length
:: encoder prompt length--decoder-prompt-length
:: decoder prompt length--bitfit
:: whether to use bitfit--adapter
:: whether to use adapter--adapter-dim
:: adapter projection dim
We recommend that your workspace directory should be organized like this:
OFA/
├── checkpoints/
│ ├── ofa_base.pt
│ ├── ofa_large.pt
│ └── ...
├── criterions/
├── data/
├── dataset/
│ ├── caption_data/
│ ├── refcoco_data/
│ └── ...
├── fairseq/
├── models/
├── run_scripts/
├── tasks/
├── train.py
├── trainer.py
└── utils/