EXTRACT

Introduction

EXTRACT is an optical character recognition engine for various operating systems which extracts texts from an image and converts them to plain text.

This model is a very primitive form of the original google tesseract which extracts texts (ONLY CAPITAL LETTERS) from an image and converts them to plain text.

Modules/Library REQUIREMENTS:

os
numpy
PIL
sys
keras
cropyble
cv2
shutil

How To Run the script:

NOTE1:- The trained model is not provided. So for the very first time run the script as it is. Once the model is trained: COMMENT OUT 'Train_Model' on line '65' and then run the script for further use.

NOTE2:- Only some fonts were taken into account so remember to use default font (calibri) in image texts with a FONT SIZE of '72' as there are assumptions to extract letters.

Run the script on your terminal: 'python3 tesseract.py': input image is:

output is (the predicted result is at the bottom):

The input image can be of any number of words example:

output is:

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
DATA		DATA
sentences		sentences
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
extractor.py		extractor.py
input.py		input.py
model_dev.py		model_dev.py
prediction.py		prediction.py
preprocessing.py		preprocessing.py
tesseract.ipynb		tesseract.ipynb
tesseract.py		tesseract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EXTRACT

Introduction

Modules/Library REQUIREMENTS:

How To Run the script:

About

Releases

Packages

Contributors 2

Languages

malikakarsh/EXTRACT

Folders and files

Latest commit

History

Repository files navigation

EXTRACT

Introduction

Modules/Library REQUIREMENTS:

How To Run the script:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages