Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document LC_CTYPE for language data #1532

Closed
jeroen opened this issue Apr 26, 2018 · 3 comments
Closed

Document LC_CTYPE for language data #1532

jeroen opened this issue Apr 26, 2018 · 3 comments

Comments

@jeroen
Copy link
Contributor

jeroen commented Apr 26, 2018

Environment

  • Tesseract Version: any
  • Platform: all

There is a bug report about a problem with loading asian training data, which that might be a bigger problem. It is currently not possible to load asian languages jpn, kor, etc on a system where LANG and locale are set to en_US.UTF-8, which is often the default.

A workaround is to set LC_CTYPE to C before calling api->Init() but it is unclear why this is needed and what side effects this has. Also I don't know if it is safe to set it back to en_US.UTF-8 afterwards.

Some documentation on this would be great.

@zdenop
Copy link
Contributor

zdenop commented Apr 26, 2018

Problem is that people do not read doc. Original issue was documented at "code.google era":

@jeroen
Copy link
Contributor Author

jeroen commented Apr 26, 2018

The link you quote talks about LC_NUMERIC. I think this is different issue than LC_CTYPE for asian languages?

@stweil
Copy link
Contributor

stweil commented Jun 22, 2018

Is this a duplicate of issue #1250?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants