Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't import tesserocr, because locale check error #137

Closed
atuyosi opened this issue Aug 5, 2018 · 5 comments
Closed

Couldn't import tesserocr, because locale check error #137

atuyosi opened this issue Aug 5, 2018 · 5 comments

Comments

@atuyosi
Copy link

atuyosi commented Aug 5, 2018

I got a import error.

import tesserocr
!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 203
Abort trap: 6

This error was caused by locale check.

Please see commit .

Simple workaround here .

import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr

I think that it is necessary to add the code somewhere in the appropriate place.

Environment:

  • macOS 10.13.6
  • Python 3.6.5
  • tesserocr 2.3.0
  • tesseract 4.0.0-beta.4-20-ge9b4e

In addition, I avoid install error by #129 workaroud.

@sirfz
Copy link
Owner

sirfz commented Aug 28, 2018

tesseract 4 requires LC_ALL, LC_CTYPE and LC_NUMERIC to be set to C: https://github.com/tesseract-ocr/tesseract/blob/4.0.0-beta.4/src/api/baseapi.cpp#L203

In my local tests it seems to have no effect with Python 2.7 but crashes with Python 3.6 and 3.7.

I'm reluctant to hard-code this into tesserocr because I'm not sure what the effect would be on other modules or Python's behavior. Maybe someone with more knowledge about this can chip in?

@atuyosi
Copy link
Author

atuyosi commented Aug 28, 2018

As you said, the hardcoding is undesirable.

IMO, I think that it is necessary to ask the Cython community about handling environment variables.

cf. 24.2. locale — Internationalization services — Python 3.7.0 documentation

FYI, other Language's solution.

Various fixes for Tesseract 4 beta.3 · ropensci/tesseract@2784542

@sirfz
Copy link
Owner

sirfz commented Aug 29, 2018

Changing locale before and after calling Init seems reasonable so I'll go with that (thanks for the links). Would resetting the locale to something other than C affect the results of other API methods though?

@sirfz
Copy link
Owner

sirfz commented Aug 29, 2018

According to tesseract-ocr/tesseract/issues/1670, this might only be temporary until they replace function calls which rely on locale settings. I'd rather wait and see how this plays out before pushing any patches.

Chilipp added a commit to Chilipp/straditize that referenced this issue Dec 9, 2018
@sirfz
Copy link
Owner

sirfz commented Aug 22, 2019

This has been fixed in tesseract 4.1

@sirfz sirfz closed this as completed Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants