Document LC_CTYPE for language data #1532

jeroen · 2018-04-26T13:29:27Z

Environment

Tesseract Version: any
Platform: all

There is a bug report about a problem with loading asian training data, which that might be a bigger problem. It is currently not possible to load asian languages jpn, kor, etc on a system where LANG and locale are set to en_US.UTF-8, which is often the default.

A workaround is to set LC_CTYPE to C before calling api->Init() but it is unclear why this is needed and what side effects this has. Also I don't know if it is safe to set it back to en_US.UTF-8 afterwards.

Some documentation on this would be great.

The text was updated successfully, but these errors were encountered:

zdenop · 2018-04-26T13:44:47Z

Problem is that people do not read doc. Original issue was documented at "code.google era":

jeroen · 2018-04-26T14:15:43Z

The link you quote talks about LC_NUMERIC. I think this is different issue than LC_CTYPE for asian languages?

stweil · 2018-06-22T15:24:02Z

Is this a duplicate of issue #1250?

stweil mentioned this issue Jun 22, 2018

recent change setlocale in baseapi.c causes Python loaded tesseract library to fail #1670

Closed

zdenop added the duplicate label Sep 30, 2018

zdenop closed this as completed Sep 30, 2018

amitdo added the locale label Mar 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document LC_CTYPE for language data #1532

Document LC_CTYPE for language data #1532

jeroen commented Apr 26, 2018 •

edited

Loading

zdenop commented Apr 26, 2018

jeroen commented Apr 26, 2018 •

edited

Loading

stweil commented Jun 22, 2018

Document LC_CTYPE for language data #1532

Document LC_CTYPE for language data #1532

Comments

jeroen commented Apr 26, 2018 • edited Loading

Environment

zdenop commented Apr 26, 2018

jeroen commented Apr 26, 2018 • edited Loading

stweil commented Jun 22, 2018

jeroen commented Apr 26, 2018 •

edited

Loading

jeroen commented Apr 26, 2018 •

edited

Loading