Image Summarize

This project takes an input image and generates a sentence summarizing the content of the image. Inspired by the work in ImageCaption, it focuses on using deep learning models to generate accurate captions for images.

Results

Output on Random Images

LSTM Output	GPT-1 Output

Dataset

The project utilizes the Flickr30k dataset from Kaggle. This dataset provides a rich set of images with corresponding captions, which are used for training and testing the model.

Model Architecture

Image Encoder

The image encoder uses a pre-trained ResNet-50 model. The last layer of ResNet-50 is removed and replaced with a linear layer to map the image embeddings to the same size as the word embeddings. The output of the image encoder is used as the first token for the text decoder.

Text Decoder

For text generation, two different architectures were implemented:

LSTM (Long Short-Term Memory) for sequential text decoding.
Stacked GPT-like Transformer Blocks, using a Transformer encoder with masks to simulate GPT architecture.

The models are defined in the model.py file.

Training

To train the model, refer to the step-by-step instructions provided in the ImageSummarize.ipynb notebook. During training:

The model achieved approximately 40-43% accuracy for 5-10 epochs.
Both LSTM and GPT models showed similar performance in generating image captions.

Training Results

LSTM Accuracy for 5 epochs	GPT-1 Accuracy for 5 epochs

LSTM Accuracy for 10 epochs	GPT-1 Accuracy for 10 epochs

Issues During Training

Configuration file paths and variables must be properly set before training.
Initial training on MacBook M1 was slow due to hardware limitations.
Training was shifted to Google Colab, where GPU resources were used, but later reverted to CPU, which also proved to be slow.
Finally, training was successfully completed on Kaggle's GPU T4x2, where the models achieved about 40% accuracy.

Running the Image Summarizer App

An Image Summarizer Generator App is provided to generate captions for images interactively. To run the app:

Navigate to the app folder.
Run the app.py file using the command:
```
python app.py
```

App Features

Upload Image: You can choose to upload an image for caption generation.
Load Sample Image: The app can randomly load a sample image using the Unsplash API.
The uploaded or generated image is previewed in the GUI.
Users can choose between the LSTM or GPT-1 models for caption generation.
The generated caption is displayed, and both the image and its caption can be saved to a folder.

LSTM Output	GPT-1 Output

Notes

Ensure all configurations (paths and variables) are correctly set before training or running the app.
Performance may vary depending on the computational resources available during training.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
ImageSummarize.ipynb		ImageSummarize.ipynb
README.md		README.md
captions.txt		captions.txt
config.py		config.py
load_dataset.py		load_dataset.py
model.py		model.py
show_instance.py		show_instance.py
train.py		train.py
vocab.py		vocab.py
vocab.txt		vocab.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Summarize

Output on Random Images

Dataset

Model Architecture

Image Encoder

Text Decoder

Training

Training Results

Issues During Training

Running the Image Summarizer App

App Features

Notes

About

Releases

Packages

Languages

dheerajkallakuri/ImageSummarize

Folders and files

Latest commit

History

Repository files navigation

Image Summarize

Output on Random Images

Dataset

Model Architecture

Image Encoder

Text Decoder

Training

Training Results

Issues During Training

Running the Image Summarizer App

App Features

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages