Skip to content

Commit

Permalink
Add workaround for tesseract 4.0
Browse files Browse the repository at this point in the history
It needs to have C locale for it's execution due to code relying on
locale dependant functions for parsing trained data (eg. sscanf).

This is really a workaround, but given that tesseract 4.0 is shipped
with upcoming Debian stable, we will have to live with this for quite
some time.

Fixes #2581

Signed-off-by: Michal Čihař <[email protected]>
  • Loading branch information
nijel committed Feb 27, 2019
1 parent 216710a commit 6724204
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 2 deletions.
1 change: 1 addition & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Released on ? 2019.
* Added check for Kashida letters.
* Added option to squash commits based on authors.
* Improved support for xlsx file format.
* Compatibility with tesseract 4.0.

weblate 3.4
-----------
Expand Down
7 changes: 5 additions & 2 deletions weblate/screenshots/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,11 @@

from PIL import Image

from weblate.utils.locale import c_locale

try:
from tesserocr import PyTessBaseAPI, RIL
with c_locale():
from tesserocr import PyTessBaseAPI, RIL
HAS_OCR = True
except ImportError:
HAS_OCR = False
Expand Down Expand Up @@ -273,7 +276,7 @@ def ocr_search(request, pk):
results = set()

# Extract and match strings
with PyTessBaseAPI() as api:
with c_locale(), PyTessBaseAPI() as api:
for image in (original_image, scaled_image):
for match in ocr_extract(api, image, strings):
results.add(sources[match])
Expand Down
36 changes: 36 additions & 0 deletions weblate/utils/locale.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# -*- coding: utf-8 -*-
#
# Copyright © 2012 - 2019 Michal Čihař <[email protected]>
#
# This file is part of Weblate <https://weblate.org/>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
#

from __future__ import absolute_import

from locale import setlocale, getlocale, LC_ALL
from contextlib import contextmanager


@contextmanager
def c_locale():
"""Context to execute something in C locale."""
try:
currlocale = getlocale()
except ValueError:
currlocale = ('C', 'UTF-8')
setlocale(LC_ALL, "C")
yield
setlocale(LC_ALL, currlocale)

0 comments on commit 6724204

Please sign in to comment.