Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in boxClipToRectangle: box outside rectangle #427

Open
PedroBarcha opened this issue Sep 14, 2016 · 13 comments
Open

Error in boxClipToRectangle: box outside rectangle #427

PedroBarcha opened this issue Sep 14, 2016 · 13 comments

Comments

@PedroBarcha
Copy link

Hi there, I've got some specific images that output the following on linux:

Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

The pictures get successfully OCRed in tesseract (without great results tho). The biggest problem for me, however, is that in OCRopus they don't even get OCRed.

example5
ghoby30c

Any ideas?

@amitdo
Copy link
Collaborator

amitdo commented Sep 19, 2016

Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

Add a white/black frame to the image and no error messages will appear.

convert  427-1.jpg  -bordercolor White -border 10x10 427-1b.jpg

Strange behaviour...

@amitdo

This comment was marked as off-topic.

@erikdubbelboer

This comment was marked as off-topic.

@amitdo

This comment was marked as off-topic.

@amitdo
Copy link
Collaborator

amitdo commented Jul 8, 2020

Similar issues #468 #1601

These error messages are produced by Leptonica.

They are triggered by a call to pixClipBoxToForeground()

https://github.com/DanBloomberg/leptonica/blob/bbe289cf3f0fe368d5b9eac64df2ccd6e9b05c56/src/pix5.c#L1956

https://github.com/tesseract-ocr/tesseract/search?q=pixClipBoxToForeground

@amitdo
Copy link
Collaborator

amitdo commented Jul 8, 2020

@stweil, this seems like a bug in Tesseract, maybe you can explore it and find its cause.

@amitdo
Copy link
Collaborator

amitdo commented Jul 8, 2020

https://github.com/tesseract-ocr/tesseract/search?q=pixClipBoxToForeground

I noticed that Tesseract does not check the return value from Leptonica's functions (l_ok).

@amitdo amitdo added the bug label Jul 8, 2020
@stweil
Copy link
Contributor

stweil commented Jul 9, 2020

@stweil, this seems like a bug in Tesseract, maybe you can explore it and find its cause.

It's caused by a box with width / height 0, but as always in Tesseract it is difficult to find the right fix.

@Nemesis77swe
Copy link

This error is still present, tried to read an image of 250x50,and got the error..
after a few trials, I found that 250x51 is working, so apparently there's a limit for the smallest size of image

@csidirop
Copy link

I have the same issue. I have a software that fetches images via wget and then runs ocr with tesseract on them. I noticed that with some images (or resolutions like I found out) the following error occurs:

Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

I found out that this only occurs at some resolutions. So I wrote a script to check this on an example image. This script decreases successively the resolution of the image and then tries to apply ocr to it with tesseract. The image has a resolution of 2090x1504 pixel.

There are no errors till the height reaches 1578 pixels. Than irregulary some errors and from 1502p nearly for every image. Some images generate several of these errors, eg:

h: 1094
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

Unlike @Nemesis77swe ,

there's a limit for the smallest size of image

I don't think that there is a limit, I think it's maybe a mathematical issue somewhere in the code which causes a box with width / height of 0 like @stweil stated.

I attached the script and the output and this is the image.


Platform:

Linux notebook63 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Tesseract Version:

tesseract 5.2.0-13-g74e22
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
 Found libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3

@csidirop
Copy link

I tried this on an other windows machine in wsl with same results:

Ubuntu 20.04 (on both win machines) and Debian buster showing exact the same outputs.

@amitdo
Copy link
Collaborator

amitdo commented Aug 15, 2022

@csidirop,

Does adding a white or black border to the image help?

#427 (comment)

If not, post an image that demonstrate the issue.

@csidirop
Copy link

Indeed, there are no errors after adding a white border

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants