Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract creates output for missing input #1023

Closed
stweil opened this issue Jul 4, 2017 · 4 comments
Closed

Tesseract creates output for missing input #1023

stweil opened this issue Jul 4, 2017 · 4 comments

Comments

@stweil
Copy link
Contributor

stweil commented Jul 4, 2017

If Tesseract is called with a non-existing image file, it creates an empty output file.

I'd expect that no output is created in that case.

Examples:

# Empty output.txt is created.
tesseract nonexisting_image.tiff output
# Empty output.hocr is created.
tesseract nonexisting_image.tiff output hocr
@Shreeshrii
Copy link
Collaborator

You are right. The error is also there for pdf. So, even though error messages are shown, empty output is produced.

 tesseract nonexisting_image.tiff output pdf
Tesseract Open Source OCR Engine v4.00.00dev-549-g2b854e3 with Leptonica
Error in fopenReadStream: file not found
Error in findFileFormat: image file not found
Error during processing.

@Shreeshrii
Copy link
Collaborator

@stweil for #1423

@zdenop
Copy link
Contributor

zdenop commented Sep 28, 2018

Problem is that output file is created during initialization of renderer and it happens before starting of OCR...

Maybe we can implement simple check if input file exists....

@zdenop zdenop closed this as completed in 1a09644 Sep 29, 2018
@zdenop
Copy link
Contributor

zdenop commented Sep 29, 2018

Done.
But there could be scenarios where empty output is created. E.g. if input file exists but is malformed or leptonica can not process. That could be solved only by changing logic of renderer: create file only when we have something for writing to file. Or remove file if OCR output is empty.

zdenop added a commit that referenced this issue Oct 9, 2018
* 'master' of https://github.com/tesseract-ocr/tesseract:
  Fix CID 1164579 (Explicit null dereferenced)
  print help for tesstrain.sh; fixes #1469
  Fix CID 1395882 (Uninitialized scalar variable)
  Fix comments
  Move content of ipoints.h to points.h and remove ipoints.h
  remove duplicate help from combine_lang_model
  Fix typo.
  use tprintf instead of printf to be able disable messages by quiet option (issue #1240)
  add "sudo ldconfig" to install instruction. fixes #1212
  unittest: Replace NULL by nullptr
  unittest: Format code
  tesseract app: check if input file exists; fixes #1023
  Format code (replace ( xxx ) by (xxx))
  Simplify boolean expressions
  Win32: use the ISO C and C++ conformant name "_putenv" instead of deprecated "putenv"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants