This repository contains my implementation of an image captioning model. The model takes an image as input and generates a descriptive English caption.
- I used several different model architectures such as CNN-LSTMs and CNN-Transformers.
- The project involves the MSCOCO2017 dataset. I initially used the Flickr30k, but I found that my captioning results were much better on MSCOCO2017, most likely because it has more data.
This model achieved a maximum BLEU-4 caption score of 11.0
Original Caption: polar bear swimming in the water by wall
Generated Caption: polar bear swimming by large wave
This project was inspired by the following papers: