From 1851e5a1c44a8a77f7c3f78ff78b5a4b01810913 Mon Sep 17 00:00:00 2001 From: Seu Pedro <33115289+seupedro@users.noreply.github.com> Date: Sat, 14 Jan 2023 15:37:22 -0300 Subject: [PATCH] Update README.md Added a link explaining what an OCR Engine is --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6c1518f69c..f8f006cd77 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ This package contains an **OCR engine** - `libtesseract` and a **command line program** - `tesseract`. -Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). +Tesseract 4 adds a new neural net (LSTM) based [OCR engine](https://en.wikipedia.org/wiki/Optical_character_recognition) which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs [traineddata](https://tesseract-ocr.github.io/tessdoc/Data-Files.html) files which support the legacy engine, for example those from the [tessdata](https://github.com/tesseract-ocr/tessdata) repository. Stefan Weil is the current lead developer. Ray Smith was the lead developer until 2018. The maintainer is Zdenko Podobny. For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/main/AUTHORS)