Skip to content

Node-RED module that uses Tesseract.js to perform local Optical Character Recognition (OCR).

License

Notifications You must be signed in to change notification settings

sjoerdvanderhoorn/node-red-contrib-tesseract

Repository files navigation

Tesseract

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. It performs all OCR tasks locally without requiring a connection to any external service.

Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

Tesseract flow

This Node-RED implementation of Tesseract.js has been provided by Sjoerd van der Hoorn.

Settings

Input

  • msg.payload - Local filename, URL, or image buffer.

Output

  • msg.payload - String with recognized text.
  • msg.tesseract - Object with recognized text split out per line and word, plus confidence information.
{
	text: "Text from image\nSecond line",
	confidence: 87,
	lines: 
	[
		{
			text: "Text from image",
			confidence: 93,
			words:
			[
				{
					text: "Text",
					confidence: 97
				},
				{
					...
				}
			]
		},
		{
			...
		}
	]
}

Additional information

About

Node-RED module that uses Tesseract.js to perform local Optical Character Recognition (OCR).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published