Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x_fsize always 0 in hocr-output since latest commit #2110

Closed
hnesk opened this issue Dec 7, 2018 · 5 comments
Closed

x_fsize always 0 in hocr-output since latest commit #2110

hnesk opened this issue Dec 7, 2018 · 5 comments
Labels

Comments

@hnesk
Copy link
Contributor

hnesk commented Dec 7, 2018

Environment

Current Behavior:

With current git master version:
$ ~/bin/tesseract -v
tesseract 4.0.0-86-gbee8
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE

$ ~/bin/tesseract --psm 3 --oem 0 -c tessedit_create_hocr=1 -c hocr_font_info=1 -l deu page.tif page-4.0.0-86-gbee8

In page-4.0.0-86-gbee8.hocr: "x_fsize 0" and "x_font" is missing completly:
<span class='ocrx_word' id='word_1_1' title='bbox 253 248 365 292; x_wconf 89; x_fsize 0'>rung</span>

Expected Behavior:

With the current ubuntu version 4.0.0-beta.3-249-g607e:
$ tesseract -v
tesseract 4.0.0-beta.3-249-g607e
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
$ tesseract --psm 3 --oem 0 -c tessedit_create_hocr=1 -c hocr_font_info=1 -l deu page.tif page-4.0.0-beta.3-249-g607e

In page-4.0.0-beta.3-249-g607e.hocr: "x_fsize" and "x_font" have sensible values:
<span class='ocrx_word' id='word_1_1' title='bbox 253 248 365 292; x_wconf 89; x_font Times_New_Roman; x_fsize 56'>rung</span>

Suggested Fix:

review c9e85ab or ad40131 one of these commits fixed too much ;)

@hnesk
Copy link
Contributor Author

hnesk commented Dec 7, 2018

Testcase:
page.zip

@amitdo
Copy link
Collaborator

amitdo commented Dec 8, 2018

c9e85ab

if (it_->word()) {

should be:
if (it_->word() == nullptr) {

@stweil
Copy link
Contributor

stweil commented Dec 8, 2018

@hnesk and @amitdo, thank you for your reports and sorry for the regression. It is fixed now with commit 2c044df.

@stweil
Copy link
Contributor

stweil commented Dec 8, 2018

x_fsize is not written by default because the hocr config file sets hocr_font_info 0, so I assume most users won't notice the bug.

@hnesk
Copy link
Contributor Author

hnesk commented Dec 8, 2018

Great! Works as expected with 2c044df

@hnesk hnesk closed this as completed Dec 8, 2018
@amitdo amitdo added the bug label May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants