Skip to content

Commit

Permalink
Merge pull request #667 from HazSyl1/master
Browse files Browse the repository at this point in the history
image captioning added
  • Loading branch information
Niketkumardheeryan authored Jun 5, 2024
2 parents 00e23c2 + 1951a03 commit 081f244
Show file tree
Hide file tree
Showing 11 changed files with 99 additions and 106 deletions.
1 change: 1 addition & 0 deletions Image Captioning/Model Files/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
**These files are to be expected to be created on a successful run of the entire notebook.**
Binary file added Image Captioning/Model Files/Tokenizer.pkl
Binary file not shown.
Binary file added Image Captioning/Model Files/captioning.rar
Binary file not shown.
Binary file added Image Captioning/Model Files/captioning_tf.rar
Binary file not shown.
Binary file added Image Captioning/Model Files/features_flickr.rar
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions Image Captioning/Notebook/image-captioning.ipynb

Large diffs are not rendered by default.

96 changes: 96 additions & 0 deletions Image Captioning/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Image Captioning

Image caption generator is a process of recognizing the context of an image and annotating it with relevant captions using deep learning and computer vision.

This involves image processing and feature extraction along side text processing.

In this repo , we'll be using a VGG16 model architecture to extract the features of images.
Then we'll create a tokenizer for the textual data.
And finally a Deep Learning architecture that gets trained over these images (features) and texts (tokenized).

This model can then take an image as an input ans annotate it / generate a caption in context of that image.

## Table of Contents
- [Overview](#overview)
- [Installation/Getting Started](#installation)
- [Dataset](#dataset)
- [Model Architecture](#model-architecture)
- [Evaluation](#evaluation)
## Overview

I've sectioned the process into 4 stages.

1. **Image Processing**: Utilizing VGG16 model architecture to extract features from images.
2. **Text Processing**: Creating a tokenizer for textual data to prepare it for the model.
3. **Model Creation & Training**: Training a deep learning model on the extracted image features and tokenized text data to generate captions.
4. **Pipelining**: Creating the pipeline , i.e. the flow of how an image can be taken as an input and a caption can be genrated.

Once trained, the model can take an image as input and generate a relevant caption describing the content of the image.

## Installation

* Clone the repo or download the files.
* To clone , run the following in cmd.

```
git clone <link here>
```

* To get started ensure you've all the libraries installed.
* You can do so by installing libraries as u go or just run the following.

```
pip install -r requirements.txt
```

## Dataset
You can download the data from here:
https://www.kaggle.com/datasets/adityajn105/flickr8k

!For convenience run the notebook over Kaggle.

## Model Architecture

### Prerequisite
What is CNN?
CNN is a subfield of Deep learning and specialized deep neural networks used to recognize and classify images. It processes the data represented as 2D matrix-like images. CNN can deal with scaled, translated, and rotated imagery. It analyzes the visual imagery by scanning them from left to right and top to bottom and extracting relevant features. Finally, it combines all the parts for image classification.

What is LSTM?
Being a type of RNN (recurrent neural network), LSTM (Long short-term memory) is capable of working with sequence prediction problems. It is mostly used for the next word prediction purposes, as in Google search our system is showing the next word based on the previous text. Throughout the processing of inputs, LSTM is used to carry out the relevant information and to discard non-relevant information.

### The Architecture we've used.

* To let the model generate caption on images , we obviously need to give it the image and the caption , both as in input to be trained on.
* This is done by concatinating the two. You can see how its dont in the notebook using the 'add()' function.
![Alt text](model_plot.png)



### Evaluation: BLEU Score for Machine Translation

**BLEU Score** serves as a vital evaluation metric for assessing the quality of Machine Translation tasks in Natural Language Processing (NLP).

#### Computing the BLEU Score

Let's illustrate the computation process using two reference translations, R1 and R2, generated by human experts, and a candidate translation, C1, produced by our translation system.

Reference Translations:
R1: The cat is on the mat.
R2: There is a cat on the mat.

Candidate Translation:
C1: The cat and the dog.

To quantify the quality of our translation using a metric, we count how many words in the candidate translation C1 match those in the reference translations R1 and R2. We then divide this count by the total number of words in C1 to obtain a percentage. This metric, which we'll denote as BLEU*, ranges from 0.0 (worst) to 1.0 (perfect).

In C1, three words ("the", "cat", "the") match those in the reference translations. Thus:

**BLEU*(C1) = 3 / 5 = 0.6**

This score indicates that the candidate translation achieves a BLEU* score of 0.6, indicating room for improvement.

To learn more: https://medium.com/nlplanet/two-minutes-nlp-learn-the-bleu-metric-by-examples-df015ca73a86

* Here is another reference example

![Alt text](bleu.jpg)
Binary file added Image Captioning/bleu.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Image Captioning/model_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
107 changes: 1 addition & 106 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,112 +156,7 @@ The six major areas of data science include the following:
<!-- Projects start -->
| Content List |
| --------------- |
| [Advanced Visualizations](Advanced%20Visualizations) |
| [Alzheimer's Disease Predictor](Alzheimer's%20Disease%20Predictor) |
| [Analysis And predict_Black_friday_sale](Analysis_&_predict_Black_friday_sale) |
| [Audio Classification](Audio%20Classification) |
| [Automatic Summarization of Scientific Papers](Automatic%20Summarization%20of%20Scientific%20Papers) |
| [Basics of ML and DL](Basics%20of%20ML%20and%20DL) |
| [Basics of Power Bi](Basics%20of%20Power%20Bi) |
| [Basics of the Python](Basics%20of%20the%20Python) |
| [Bidirectional LSTM](Bidirectional%20LSTM) |
| [Bird Species Classification Web App](Bird%20Species%20Classification%20Web%20App) |
| [Bitcoin Price Prediction Web App](Bitcoin%20Price%20Prediction%20Web%20App) |
| [Bitcoin Price Predictor](Bitcoin%20Price%20Predictor) |
| [CBT_ChatBot](CBT_ChatBot) |
| [COVID_19-DATA-ANALYSIS](COVID_19-DATA-ANALYSIS) |
| [Cheat Sheets](Cheat%20Sheets) |
| [Class Imbalance problem](Class%20Imbalance%20problem) |
| [Classification Algorithms](Classification%20Algorithms) |
| [Cloud Details](Cloud%20Details) |
| [Covid19 forecasting with prophet](Covid19%20forecasting%20with%20prophet) |
| [Covid_Third_Wave_Forecasting](Covid_Third_Wave_Forecasting) |
| [CrowdAI Plant Disease](CrowdAI%20Plant%20Disease) |
| [Customer Segmentation using Machine Learning](Customer%20Segmentation%20using%20Machine%20Learning) |
| [Data Cleaning Techniques](Data%20Cleaning%20Techniques) |
| [Data Filling and Cleaning Techniques](Data%20Filling%20and%20Cleaning%20Techniques) |
| [Different types of Clustering](Different%20types%20of%20Clustering) |
| [Different types of feature selection techniques](Different%20types%20of%20feature%20selection%20techniques) |
| [Different_types_of_scaling_Method](Different_types_of_scaling_Method) |
| [Driver_Drowsiness_Detection](Driver_Drowsiness_Detection) |
| [EDA-and-Perform-Modelling-on-Ionosphere-Dataset-main](EDA-and-Perform-Modelling-on-Ionosphere-Dataset-main) |
| [Email Classifier](Email%20Classifier) |
| [Emotion Recognition Based on NLP](Emotion%20Recognition%20Based%20on%20NLP) |
| [Ensemble Methds in ML](Ensemble%20Methds%20in%20ML) |
| [Explaination and Example for P value with code](Explaination%20and%20Example%20for%20P%20value%20with%20code) |
| [Exploratory-data-analysis](Exploratory-data-analysis) |
| [Extract_Text_from_PDF_using_Python](Extract_Text_from_PDF_using_Python) |
| [Fake_News_Detection](Fake_News_Detection) |
| [File of SQL Commands](File%20of%20SQL%20Commands) |
| [Fish-Weight-Estimation](Fish-Weight-Estimation) |
| [Flight_delay_prediction_project](Flight_delay_prediction_project) |
| [GDP Prediction](GDP%20Prediction) |
| [GUI-JARVIS](GUI-JARVIS) |
| [Gender Pay Gap Analysis](Gender%20Pay%20Gap%20Analysis) |
| [Google Teachable Machine](Google%20Teachable%20Machine) |
| [Handwritten Equation Solver using CNN](Handwritten%20Equation%20Solver%20using%20CNN) |
| [Handwritten character recognition](Handwritten%20character%20recognition) |
| [Heart_Predection](Heart_Predection) |
| [HollywoodMarketSynopsis](HollywoodMarketSynopsis) |
| [IMDB Box Office Prediction](IMDB%20Box%20Office%20Prediction) |
| [LanguageDetection](LanguageDetection) |
| [Medical Charges for Smokers and Non-smoker](Medical%20Charges%20for%20Smokers%20and%20Non-smoker) |
| [Medical_Help_Chatbot](Medical_Help_Chatbot) |
| [Meteorite Landing Data Analysis](Meteorite%20Landing%20Data%20Analysis) |
| [Movie-Recommendation-System](Movie-Recommendation-System) |
| [Movie-Recommender-System using python](Movie-Recommender-System%20using%20python) |
| [Nasa-Asteroids-Dataset-Analysis](Nasa-Asteroids-Dataset-Analysis) |
| [NumPy - Basics](NumPy%20-%20Basics) |
| [Number_of_people_counter](Number_of_people_counter) |
| [OCR-Medicine-Reader](OCR-Medicine-Reader) |
| [Object Detection](Object%20Detection) |
| [Ola Bike Ride Request Demand Forecast](Ola%20Bike%20Ride%20Request%20Demand%20Forecast) |
| [Optical character recognition (OCR)](Optical%20character%20recognition%20(OCR)) |
| [Plant Seedlings Classification](Plant%20Seedlings%20Classification) |
| [R language](R%20language) |
| [Random forest from scratch](Random%20forest%20from%20scratch) |
| [Random forest test](Random%20forest%20test) |
| [Rock Paper Scissors Python Game](Rock%20Paper%20Scissors%20Python%20Game) |
| [Sentiment analysis for depression based on social media posts](Sentiment%20analysis%20for%20depression%20based%20on%20social%20media%20posts) |
| [Sentiment-Analysis](Sentiment-Analysis) |
| [Skin Disease Predictor](Skin%20Disease%20Predictor) |
| [Spam Mail Detection](Spam%20Mail%20Detection) |
| [Speech_Emotion_Recognition](Speech_Emotion_Recognition) |
| [Spelling Corrector](Spelling%20Corrector) |
| [Sports Analytics Project](Sports%20Analytics%20Project) |
| [Startup_Profit_Prediction](Startup_Profit_Prediction) |
| [Stock Price Analysis](Stock%20Price%20Analysis) |
| [Sudoku Solver using CNN](Sudoku%20Solver%20using%20CNN) |
| [Tensorflow.js Demo](Tensorflow.js%20Demo) |
| [Time Series Forecasting with Python](Time%20Series%20Forecasting%20with%20Python) |
| [Time-Series LSTM Model](Time-Series%20LSTM%20Model) |
| [Unique Chatbot](Unique%20Chatbot) |
| [Various Plots using Matplot,Seaborn,Pandas](Various%20Plots%20using%20Matplot%2CSeaborn%2CPandas) |
| [Vehicles and Pedestrian Detection](Vehicles%20and%20Pedestrian%20Detection) |
| [Weather Prediction](Weather%20Prediction) |
| [Web-Scraping-with-Beautiful-Soup-master](Web-Scraping-with-Beautiful-Soup-master) |
| [XgBoost_Algorithm](XgBoost_Algorithm) |
| [ensemble-methods-notebooks-master](ensemble-methods-notebooks-master) |
| [heart failure](heart%20failure) |
| [job_Advertisement_detection](job_Advertisement_detection) |
| [logistic_regression_scratch](logistic_regression_scratch) |
| [recommendation_system](recommendation_system) |
| [.DS_Store](.DS_Store) |
| [Analysis_of_Temperature_Rise_in_PMSM.ipynb](Analysis_of_Temperature_Rise_in_PMSM.ipynb) |
| [Beautiful Soup.ipynb](Beautiful%20Soup.ipynb) |
| [Ensemble learning.docx](Ensemble%20learning.docx) |
| [Ensemble-Learning (Stacking)](Ensemble-Learning%20(Stacking)) |
| [Machine Hack -1.ipynb](Machine%20Hack%20-1.ipynb) |
| [README.md updated file](README.md%20updated%20file) |
| [Role_from_Resume.ipynb](Role_from_Resume.ipynb) |
| [Sql](Sql) |
| [Statistics- Basics.ipynb](Statistics-%20Basics.ipynb) |
| [Test Task_NIket.ipynb](Test%20Task_NIket.ipynb) |
| [UBER_DATA_ANALYSIS.ipynb](UBER_DATA_ANALYSIS.ipynb) |
| [Various_Plots_in_Matplotlib.ipynb](Various_Plots_in_Matplotlib.ipynb) |
| [Visualization with Seaborn & Matplotlib.ipynb](Visualization%20%20with%20Seaborn%20%26%20Matplotlib.ipynb) |
| [buyer_s_time234.ipynb](buyer_s_time234.ipynb) |
| [random_forest.py](random_forest.py) |

<!-- Projects end -->

### Note:
Expand Down

0 comments on commit 081f244

Please sign in to comment.