Image captioning with ViT and GPT-2 from learning to demonstration on Modal

About

This repository provides programs to (1) fine-tune a model for image captioning with ViT and GPT2 and (2) demonstrate image captioning with the learned models.

What is Modal

Modal builds amazing infrastructure for data/ML apps in the cloud. You can build and use microservices in the cloud as if you were writing and executing Python code locally. I was amazed at the ease of use and design of this new service. I like Modal.

To run this program, please register an account with Modal .

ViT-GPT-2 Image Captioning

Base model: nlpconnect/vit-gpt2-image-captioning
Dataset for fine-tuning:
- COCO (Recommended)
- RedCaps

How to use

(0) Set the dataset on SharedVolume for fine-tuning

COCO (Recommended)

Download coco dataset from the official repositores to a shared volume on Modal.
The name of the shared volume is specified in model_training/config.py (default: image-caption-vol).

$ modal run model_training/download_coco_dataset.py

Check the shared volume.

$ modal volume ls image-caption-vol /coco

Directory listing of 'coco' in 'image-caption-vol'
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┓
┃ filename                     ┃ type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━┩
│ train2017                    │ dir  │
│ annotations                  │ dir  │
│ val2017                      │ dir  │
└──────────────────────────────┴──────┘

RedCaps

Split the RedCaps dataset into training, validation, and testing, then save them into SharedVolume and/or your Hugging Face repository.
The name of the shared volume is specified in model_training/config.py (default: image-caption-vol).

$ modal run model_training/split_dataset.py --save-dir=red_caps --push-hub-rep=[YOUR-HF-ACCOUNT]/red_caps

Check the shared volume.

$ modal volume ls image-caption-vol /red_caps

Directory listing of '/red_caps' in 'image-caption-vol'
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━┓
┃ filename          ┃ type ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━┩
│ test              │ dir  │
│ train             │ dir  │
│ dataset_dict.json │ file │
│ val               │ dir  │
└───────────────────┴──────┘

Build a subset of the dataset the shared volume on Modal.

$ modal run model_training/build_dataset_subset.py --from-dataset-path=red_caps \ --to-dataset-path=red-caps-5k-01 --num-train=3500 --num-val=500 --num-test=1000

(1) Fine-tuning

Start the training on Modal.
The default stab name is vit-gpt2-image-caption-train. Machine usage (e.g. GPU memory in use) is available at https://modal.com/apps.

$ modal run model_training/train.py

Check the learning status with TensorBoard

$ modal deploy model_training/tfboard_webapp.py

Access the URL displayed as "Created tensorboard_app => https://XXXXXX.modal.run" to open the TensorBoard.

TensorBoard screen

(2) Demonstration

Deploy the web endpoints for the demo.

$ modal deploy demo/vit_gpt2_image_caption.py
$ modal deploy demo/vit_gpt2_image_caption_webapp.py

Open the website and try the demo.

"Created wrapper => https://[YOUR_ACCOUNT]--vit-gpt2-image-caption-webapp-wrapper.modal.run"

Demo website screen

My demo website is here.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
demo		demo
img		img
model_training		model_training
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image captioning with ViT and GPT-2 from learning to demonstration on Modal

About

What is Modal

ViT-GPT-2 Image Captioning

How to use

(0) Set the dataset on SharedVolume for fine-tuning

COCO (Recommended)

RedCaps

(1) Fine-tuning

(2) Demonstration

About

Releases

Packages

Languages

yuukicammy/vit-gpt2-image-caption

Folders and files

Latest commit

History

Repository files navigation

Image captioning with ViT and GPT-2 from learning to demonstration on Modal

About

What is Modal

ViT-GPT-2 Image Captioning

How to use

(0) Set the dataset on SharedVolume for fine-tuning

COCO (Recommended)

RedCaps

(1) Fine-tuning

(2) Demonstration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages