-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text not Detecting in conversation #883
Comments
Tesseract.js includes an output option that allows you to retrieve the actual binarized image recognized by Tesseract. An example site using this option can be found here. Sure enough, as speculated by @Kishlay-notabot, that confirms that the messages in blue are being erased by the binarization process. Using this example code, you should be able to experiment with Tesseract's binarization options. These are not documented in this repo, however you can find them in the main Tesseract project's repo, and I pasted the descriptions from the code below. I have not used these options before, so am not sure what (if any) options would improve results with this screenshot. If none of these options work, you would need to either (1) binarize the image properly yourself before sending to Tesseract or (2) crop the images to specific messages before processing.
|
It looks like the image is recognized perfectly, without needing to change any Tesseract.js settings, when it is first inverted. I don't know how generalizable this is since message apps can differ between white on black/black on white/mixed, however inverting the image to black text on a light background solves in this case. |
@Balearica |
Tesseract.js version 5.0.4
Describe the bug
For Some reason when I use conversation text it is not detecting conversations under in blue container
To Reproduce
Steps to reproduce the behavior:
just use the image
Expected behavior
It needs to be
Device Version:
The text was updated successfully, but these errors were encountered: