Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. It performs all OCR tasks locally without requiring a connection to any external service.
Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
This Node-RED implementation of Tesseract.js has been provided by Sjoerd van der Hoorn.
- Language - Code (List of available language codes).
msg.payload
- Local filename, URL, or image buffer.
msg.payload
- String with recognized text.msg.tesseract
- Object with recognized text split out per line and word, plus confidence information.
{
text: "Text from image\nSecond line",
confidence: 87,
lines:
[
{
text: "Text from image",
confidence: 93,
words:
[
{
text: "Text",
confidence: 97
},
{
...
}
]
},
{
...
}
]
}