Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image captioning added #667

Merged
merged 7 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Image Captioning/Model Files/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
**These files are to be expected to be created on a successful run of the entire notebook.**
Binary file added Image Captioning/Model Files/Tokenizer.pkl
Binary file not shown.
Binary file added Image Captioning/Model Files/captioning.rar
Binary file not shown.
Binary file added Image Captioning/Model Files/captioning_tf.rar
Binary file not shown.
Binary file added Image Captioning/Model Files/features_flickr.rar
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions Image Captioning/Notebook/image-captioning.ipynb

Large diffs are not rendered by default.

96 changes: 96 additions & 0 deletions Image Captioning/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Image Captioning

Image caption generator is a process of recognizing the context of an image and annotating it with relevant captions using deep learning and computer vision.

This involves image processing and feature extraction along side text processing.

In this repo , we'll be using a VGG16 model architecture to extract the features of images.
Then we'll create a tokenizer for the textual data.
And finally a Deep Learning architecture that gets trained over these images (features) and texts (tokenized).

This model can then take an image as an input ans annotate it / generate a caption in context of that image.

## Table of Contents
- [Overview](#overview)
- [Installation/Getting Started](#installation)
- [Dataset](#dataset)
- [Model Architecture](#model-architecture)
- [Evaluation](#evaluation)
## Overview

I've sectioned the process into 4 stages.

1. **Image Processing**: Utilizing VGG16 model architecture to extract features from images.
2. **Text Processing**: Creating a tokenizer for textual data to prepare it for the model.
3. **Model Creation & Training**: Training a deep learning model on the extracted image features and tokenized text data to generate captions.
4. **Pipelining**: Creating the pipeline , i.e. the flow of how an image can be taken as an input and a caption can be genrated.

Once trained, the model can take an image as input and generate a relevant caption describing the content of the image.

## Installation

* Clone the repo or download the files.
* To clone , run the following in cmd.

```
git clone <link here>
```

* To get started ensure you've all the libraries installed.
* You can do so by installing libraries as u go or just run the following.

```
pip install -r requirements.txt
```

## Dataset
You can download the data from here:
https://www.kaggle.com/datasets/adityajn105/flickr8k

!For convenience run the notebook over Kaggle.

## Model Architecture

### Prerequisite
What is CNN?
CNN is a subfield of Deep learning and specialized deep neural networks used to recognize and classify images. It processes the data represented as 2D matrix-like images. CNN can deal with scaled, translated, and rotated imagery. It analyzes the visual imagery by scanning them from left to right and top to bottom and extracting relevant features. Finally, it combines all the parts for image classification.

What is LSTM?
Being a type of RNN (recurrent neural network), LSTM (Long short-term memory) is capable of working with sequence prediction problems. It is mostly used for the next word prediction purposes, as in Google search our system is showing the next word based on the previous text. Throughout the processing of inputs, LSTM is used to carry out the relevant information and to discard non-relevant information.

### The Architecture we've used.

* To let the model generate caption on images , we obviously need to give it the image and the caption , both as in input to be trained on.
* This is done by concatinating the two. You can see how its dont in the notebook using the 'add()' function.
![Alt text](model_plot.png)



### Evaluation: BLEU Score for Machine Translation

**BLEU Score** serves as a vital evaluation metric for assessing the quality of Machine Translation tasks in Natural Language Processing (NLP).

#### Computing the BLEU Score

Let's illustrate the computation process using two reference translations, R1 and R2, generated by human experts, and a candidate translation, C1, produced by our translation system.

Reference Translations:
R1: The cat is on the mat.
R2: There is a cat on the mat.

Candidate Translation:
C1: The cat and the dog.

To quantify the quality of our translation using a metric, we count how many words in the candidate translation C1 match those in the reference translations R1 and R2. We then divide this count by the total number of words in C1 to obtain a percentage. This metric, which we'll denote as BLEU*, ranges from 0.0 (worst) to 1.0 (perfect).

In C1, three words ("the", "cat", "the") match those in the reference translations. Thus:

**BLEU*(C1) = 3 / 5 = 0.6**

This score indicates that the candidate translation achieves a BLEU* score of 0.6, indicating room for improvement.

To learn more: https://medium.com/nlplanet/two-minutes-nlp-learn-the-bleu-metric-by-examples-df015ca73a86

* Here is another reference example

![Alt text](bleu.jpg)
Binary file added Image Captioning/bleu.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Image Captioning/model_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
107 changes: 1 addition & 106 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,112 +156,7 @@ The six major areas of data science include the following:
<!-- Projects start -->
| Content List |
| --------------- |
| [Advanced Visualizations](Advanced%20Visualizations) |
| [Alzheimer's Disease Predictor](Alzheimer's%20Disease%20Predictor) |
| [Analysis And predict_Black_friday_sale](Analysis_&_predict_Black_friday_sale) |
| [Audio Classification](Audio%20Classification) |
| [Automatic Summarization of Scientific Papers](Automatic%20Summarization%20of%20Scientific%20Papers) |
| [Basics of ML and DL](Basics%20of%20ML%20and%20DL) |
| [Basics of Power Bi](Basics%20of%20Power%20Bi) |
| [Basics of the Python](Basics%20of%20the%20Python) |
| [Bidirectional LSTM](Bidirectional%20LSTM) |
| [Bird Species Classification Web App](Bird%20Species%20Classification%20Web%20App) |
| [Bitcoin Price Prediction Web App](Bitcoin%20Price%20Prediction%20Web%20App) |
| [Bitcoin Price Predictor](Bitcoin%20Price%20Predictor) |
| [CBT_ChatBot](CBT_ChatBot) |
| [COVID_19-DATA-ANALYSIS](COVID_19-DATA-ANALYSIS) |
| [Cheat Sheets](Cheat%20Sheets) |
| [Class Imbalance problem](Class%20Imbalance%20problem) |
| [Classification Algorithms](Classification%20Algorithms) |
| [Cloud Details](Cloud%20Details) |
| [Covid19 forecasting with prophet](Covid19%20forecasting%20with%20prophet) |
| [Covid_Third_Wave_Forecasting](Covid_Third_Wave_Forecasting) |
| [CrowdAI Plant Disease](CrowdAI%20Plant%20Disease) |
| [Customer Segmentation using Machine Learning](Customer%20Segmentation%20using%20Machine%20Learning) |
| [Data Cleaning Techniques](Data%20Cleaning%20Techniques) |
| [Data Filling and Cleaning Techniques](Data%20Filling%20and%20Cleaning%20Techniques) |
| [Different types of Clustering](Different%20types%20of%20Clustering) |
| [Different types of feature selection techniques](Different%20types%20of%20feature%20selection%20techniques) |
| [Different_types_of_scaling_Method](Different_types_of_scaling_Method) |
| [Driver_Drowsiness_Detection](Driver_Drowsiness_Detection) |
| [EDA-and-Perform-Modelling-on-Ionosphere-Dataset-main](EDA-and-Perform-Modelling-on-Ionosphere-Dataset-main) |
| [Email Classifier](Email%20Classifier) |
| [Emotion Recognition Based on NLP](Emotion%20Recognition%20Based%20on%20NLP) |
| [Ensemble Methds in ML](Ensemble%20Methds%20in%20ML) |
| [Explaination and Example for P value with code](Explaination%20and%20Example%20for%20P%20value%20with%20code) |
| [Exploratory-data-analysis](Exploratory-data-analysis) |
| [Extract_Text_from_PDF_using_Python](Extract_Text_from_PDF_using_Python) |
| [Fake_News_Detection](Fake_News_Detection) |
| [File of SQL Commands](File%20of%20SQL%20Commands) |
| [Fish-Weight-Estimation](Fish-Weight-Estimation) |
| [Flight_delay_prediction_project](Flight_delay_prediction_project) |
| [GDP Prediction](GDP%20Prediction) |
| [GUI-JARVIS](GUI-JARVIS) |
| [Gender Pay Gap Analysis](Gender%20Pay%20Gap%20Analysis) |
| [Google Teachable Machine](Google%20Teachable%20Machine) |
| [Handwritten Equation Solver using CNN](Handwritten%20Equation%20Solver%20using%20CNN) |
| [Handwritten character recognition](Handwritten%20character%20recognition) |
| [Heart_Predection](Heart_Predection) |
| [HollywoodMarketSynopsis](HollywoodMarketSynopsis) |
| [IMDB Box Office Prediction](IMDB%20Box%20Office%20Prediction) |
| [LanguageDetection](LanguageDetection) |
| [Medical Charges for Smokers and Non-smoker](Medical%20Charges%20for%20Smokers%20and%20Non-smoker) |
| [Medical_Help_Chatbot](Medical_Help_Chatbot) |
| [Meteorite Landing Data Analysis](Meteorite%20Landing%20Data%20Analysis) |
| [Movie-Recommendation-System](Movie-Recommendation-System) |
| [Movie-Recommender-System using python](Movie-Recommender-System%20using%20python) |
| [Nasa-Asteroids-Dataset-Analysis](Nasa-Asteroids-Dataset-Analysis) |
| [NumPy - Basics](NumPy%20-%20Basics) |
| [Number_of_people_counter](Number_of_people_counter) |
| [OCR-Medicine-Reader](OCR-Medicine-Reader) |
| [Object Detection](Object%20Detection) |
| [Ola Bike Ride Request Demand Forecast](Ola%20Bike%20Ride%20Request%20Demand%20Forecast) |
| [Optical character recognition (OCR)](Optical%20character%20recognition%20(OCR)) |
| [Plant Seedlings Classification](Plant%20Seedlings%20Classification) |
| [R language](R%20language) |
| [Random forest from scratch](Random%20forest%20from%20scratch) |
| [Random forest test](Random%20forest%20test) |
| [Rock Paper Scissors Python Game](Rock%20Paper%20Scissors%20Python%20Game) |
| [Sentiment analysis for depression based on social media posts](Sentiment%20analysis%20for%20depression%20based%20on%20social%20media%20posts) |
| [Sentiment-Analysis](Sentiment-Analysis) |
| [Skin Disease Predictor](Skin%20Disease%20Predictor) |
| [Spam Mail Detection](Spam%20Mail%20Detection) |
| [Speech_Emotion_Recognition](Speech_Emotion_Recognition) |
| [Spelling Corrector](Spelling%20Corrector) |
| [Sports Analytics Project](Sports%20Analytics%20Project) |
| [Startup_Profit_Prediction](Startup_Profit_Prediction) |
| [Stock Price Analysis](Stock%20Price%20Analysis) |
| [Sudoku Solver using CNN](Sudoku%20Solver%20using%20CNN) |
| [Tensorflow.js Demo](Tensorflow.js%20Demo) |
| [Time Series Forecasting with Python](Time%20Series%20Forecasting%20with%20Python) |
| [Time-Series LSTM Model](Time-Series%20LSTM%20Model) |
| [Unique Chatbot](Unique%20Chatbot) |
| [Various Plots using Matplot,Seaborn,Pandas](Various%20Plots%20using%20Matplot%2CSeaborn%2CPandas) |
| [Vehicles and Pedestrian Detection](Vehicles%20and%20Pedestrian%20Detection) |
| [Weather Prediction](Weather%20Prediction) |
| [Web-Scraping-with-Beautiful-Soup-master](Web-Scraping-with-Beautiful-Soup-master) |
| [XgBoost_Algorithm](XgBoost_Algorithm) |
| [ensemble-methods-notebooks-master](ensemble-methods-notebooks-master) |
| [heart failure](heart%20failure) |
| [job_Advertisement_detection](job_Advertisement_detection) |
| [logistic_regression_scratch](logistic_regression_scratch) |
| [recommendation_system](recommendation_system) |
| [.DS_Store](.DS_Store) |
| [Analysis_of_Temperature_Rise_in_PMSM.ipynb](Analysis_of_Temperature_Rise_in_PMSM.ipynb) |
| [Beautiful Soup.ipynb](Beautiful%20Soup.ipynb) |
| [Ensemble learning.docx](Ensemble%20learning.docx) |
| [Ensemble-Learning (Stacking)](Ensemble-Learning%20(Stacking)) |
| [Machine Hack -1.ipynb](Machine%20Hack%20-1.ipynb) |
| [README.md updated file](README.md%20updated%20file) |
| [Role_from_Resume.ipynb](Role_from_Resume.ipynb) |
| [Sql](Sql) |
| [Statistics- Basics.ipynb](Statistics-%20Basics.ipynb) |
| [Test Task_NIket.ipynb](Test%20Task_NIket.ipynb) |
| [UBER_DATA_ANALYSIS.ipynb](UBER_DATA_ANALYSIS.ipynb) |
| [Various_Plots_in_Matplotlib.ipynb](Various_Plots_in_Matplotlib.ipynb) |
| [Visualization with Seaborn & Matplotlib.ipynb](Visualization%20%20with%20Seaborn%20%26%20Matplotlib.ipynb) |
| [buyer_s_time234.ipynb](buyer_s_time234.ipynb) |
| [random_forest.py](random_forest.py) |

<!-- Projects end -->

### Note:
Expand Down