Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR on cyrillic text #1364

Closed
Blightbuster opened this issue Dec 30, 2021 · 7 comments
Closed

OCR on cyrillic text #1364

Blightbuster opened this issue Dec 30, 2021 · 7 comments
Labels

Comments

@Blightbuster
Copy link

Blightbuster commented Dec 30, 2021

Summary of your issue

Using OCR on cyrillic text yields a empty string even with correct model and white list

Environment

OpenCVSharp 4.5.3.20211228

What did you do when you faced the problem?

  • Verified that it works with english model on latin characters
  • Tested the image from the example below in the console with the same model and got this result:
    "Дульный тормоз-компенсатор Зенит "ДТК-1" 762х39 и 5.45х59 для АК"

Example code:

image

var whiteList = "АаБбВвГгДдЕеЁёЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯя";
var model = OCRTesseract.Create("models", "rus", whiteList, 3, 7);
string text = "";
model.Run(img, out text, out _, out _, out _, ComponentLevels.TextLine);
if (text == "") Console.WriteLine("Empty");
@Blightbuster
Copy link
Author

Could this perhaps be an issue with how the wrapper handles the conversion of string encodings? (wild guess)

@Blightbuster
Copy link
Author

I also tried using the characters as in the desired_characters file.
0123456789ЁАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяё
Unfortunately with the same result...

@Blightbuster
Copy link
Author

Perhaps related to this:
https://stackoverflow.com/a/9983550

@stale
Copy link

stale bot commented Jun 29, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 29, 2022
@Blightbuster
Copy link
Author

To circumvent this issue, the tesseract package on nuget can be used as it does not have this issue.

@stale stale bot removed the wontfix label Jul 2, 2022
@stale
Copy link

stale bot commented Dec 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 31, 2022
@stale stale bot closed this as completed Jan 16, 2023
@n0099
Copy link
Contributor

n0099 commented Mar 7, 2023

#873 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants