Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Illegal Parameter specification! with Tesseract4Alpha #1010

Closed
nachobit opened this issue Jun 28, 2017 · 35 comments
Closed

Error: Illegal Parameter specification! with Tesseract4Alpha #1010

nachobit opened this issue Jun 28, 2017 · 35 comments
Labels

Comments

@nachobit
Copy link

After upgrade to Tesseract-4-Alpha, I found this error making the OCR from my JAVA code:

ITesseract instance = new Tesseract(); instance.setDatapath("/usr/share/tessdata/"); instance.setLanguage("spa"); (...) result = instance.doOCR(imageFile);


Environment

  • Tesseract Version: tesseract 4.00.00alpha
  • Leptonica Version: leptonica-1.74.4
  • Platform: CentOS 6.7
  • Server: Wildfly 10.1

Current Behavior:

Error: Illegal Parameter specification!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007ff1b3098549, pid=25091, tid=0x00007ff29d7d7700

JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops)
Problematic frame:
C [libtesseract.so+0x26f549] ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+0x129

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:
/opt/wildfly/wildfly-10.1.0.Final/hs_err_pid25091.log

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp
The crash happened outside the Java Virtual Machine in native code.
See problematic frame for where to report the bug.

*** JBossAS process (25091) received ABRT signal ***

Suggested Fix:

Any idea?

@Shreeshrii
Copy link
Collaborator

Please use the latest source from master branch of github and inform whether you still get the error.

@nachobit
Copy link
Author

I'm using the lastest source yet. I have the same Error (*) in two different OS.

(*) Error: Illegal Parameter specification!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75

@Shreeshrii
Copy link
Collaborator

what version of c++ are you using?

two different OS

which ones? I have been able to build on ubuntu 14.04. Travis and appveyor builds are building ok.

@Shreeshrii
Copy link
Collaborator

Also, are you able to run tesseract from command line ?

tesseract -v

also try to OCR the sample image from testing folder.

@nachobit
Copy link
Author

nachobit commented Jun 29, 2017

@Shreeshrii I'm using g++ 7.1.1 in Arch and 4.8.2 in CentOS 6.7.
I launch tesseract from Java. No problems with 3.05 version but I get the error previously commented with 4Alpha version.

tesseract 4.00.00alpha
leptonica-1.74.4
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.1) : libpng 1.6.29 : libtiff 4.0.8 : zlib 1.2.11 : libwebp 0.6.0

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Jun 29, 2017

what about

tesseract --list_langs

Are you able to OCR an image from command line with the 4.0 version?

Do you have 4.00.00alpha version of traineddata files?

Download 4.0 traineddata to a different folder and refer to that

@nachobit
Copy link
Author

No problems detecting langs with tesseract --list_langs (eng, spa and osd trainned files for LSTM based 4.00.00alpha version).

About command line recognition, I have done fine an example from testing folder properly.

Perhaps, some Java code has changed from this 4Alpha version?

@Shreeshrii
Copy link
Collaborator

@nachobit Please see Quan's Java JNA wrapper for Tesseract OCR API at https://github.com/nguyenq/tess4j

@nachobit
Copy link
Author

nachobit commented Jun 29, 2017

The problem with 3.05.01 version is that I get different resutls from both OS using same Leptonica and Tesseract ver. in a PDF recognition.

Example:

0000 0340 º71º ZL (in CentOS) and 0000 0340 0710 ZL (in Arch).

For that reason I'd like to improve the 4Alpha but it's impossible for the error commented some lines back.

@amitdo
Copy link
Collaborator

amitdo commented Jun 29, 2017

If you have an issue with a wrapper to Tesseract's C/C++ API, please report the issue to the developers of that software.

@amitdo
Copy link
Collaborator

amitdo commented Jun 29, 2017

I'm using g++ 7.1.1 in Arch and 4.8.2 in CentOS 6.7.

0000 0340 º71º ZL (in CentOS) and 0000 0340 0710 ZL (in Arch).

Ray said, many years ago, that you can get different results with different compilers.

@amitdo
Copy link
Collaborator

amitdo commented Jun 29, 2017

Perhaps, some Java code has changed from this 4Alpha version?

Yes.
https://github.com/nguyenq/tess4j/commits/master

@nachobit
Copy link
Author

Updated to the lastest libs from Tess4J-3.4.0-src I get same error when launch the OCR from Java code.

From 3.05.01 version, is there any solution to solve the fail recognizing "zeros" ( º instead of 0)?

@amitdo
Copy link
Collaborator

amitdo commented Jun 29, 2017

3.4.0 does not include the 4.00 changes.
https://github.com/nguyenq/tess4j/commits/tess4j-3.4.0

@amitdo
Copy link
Collaborator

amitdo commented Jun 29, 2017

From 3.05.01 version, is there any solution to solve the fail recognizing "zeros" ( º instead of 0)?

You can try to compile with a newer version of gcc. I can't promise that this 'solution' will help you with this issue.

@nachobit
Copy link
Author

Ok, that's the problem? Tess4J-3.4.0 (Java) is not supported by 4.00Alpha release? Then I will try compilling with a newer version of GCC.

@amitdo
Copy link
Collaborator

amitdo commented Jun 29, 2017

Ok, that's the problem? Tess4J-3.4.0 (Java) is not supported by 4.00Alpha release?

I assume that's the source of the problem (It's Tess4J 3.4.0 that seems to not have support for Tesseract 4.00, not vice versa). To be sure, ask the developer.

https://sourceforge.net/p/tess4j/discussion/1202294/
https://github.com/nguyenq/tess4j/issues

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Jun 29, 2017 via email

@nguyenq
Copy link
Contributor

nguyenq commented Jun 29, 2017

tess4j's master branch is for Tesseract 4.0alpha and includes the latest Tesseract 4.0alpha Windows binary. All of its unit tests passed on Windows 10. We have not tested on Linux OS yet.

Since you link against Leptonica 1.74.4, make sure you use lept4j-1.6.0.

@zdenop
Copy link
Contributor

zdenop commented Jun 29, 2017

We do not support 3rd party sw including tesseract wrapper. Please reproduce error with c++

@zdenop zdenop closed this as completed Jun 29, 2017
@banksone
Copy link

banksone commented Aug 6, 2017

Hi .. it seems you just have to add environment variable LC_NUMERIC="C" ... and it works. :)

@AlexanderHugestrand
Copy link

AlexanderHugestrand commented Sep 27, 2017

I dug into Tesseract's code and found that the string "Illegal Parameter specification" only exists in one place, namely in the file classify/clusttool.cpp. After some debugging I realised that the function ReadParamDesc() calls sscanf() at line 82 (for git commit hash 2b854e3), which is locale dependent. It fails since the numeric input (two floating point values) are written with dots (example: 1.23), but using a different locale other than en_US for LC_NUMERIC may cause sscanf() to expect other characters, like commas (1,23).

With other words, the error is in tesseract, assuming a locale. It should rather be set explicitly. The workaround is to set LC_NUMERIC=en_US.UTF-8.

@hamduu
Copy link

hamduu commented Apr 5, 2018

i am facing the same issue.Please could u share the file which has to be changed, so that i can jus go replace the specific file and continue creating the traineddata.

@hamduu
Copy link

hamduu commented Apr 5, 2018

@nizzeberra : i see the same files as you say, but dont know where to place the code. Please could you share that file.

@Shreeshrii
Copy link
Collaborator

@stweil Is it possible to address this for final 4.0.0?

@stweil
Copy link
Contributor

stweil commented Apr 5, 2018

Setting LC_NUMERIC in the Tesseract code would perhaps solve the problem, but is not a good solution for people who use the Tesseract library. They don't expect that Tesseract changes LC_NUMERIC, and perhaps they need a different value.

I wonder whether the sscanf handling of %f does really depend on the locale settings. It does not on my Debian GNU Linux system, nor could I find a hint in the MSDN documentation on sscanf.

@nizzeberra, which systems / C libraries show that strange behaviour? Do you have links to documentation?

PS: These code locations use %f:

classify/clusttool.cpp:        sscanf(line, "%" QUOTED_TOKENSIZE "s %" QUOTED_TOKENSIZE "s %f %f",
classify/ocrfeatures.cpp:    if (tfscanf(File, "%f", &(Feature->Params[i])) != 1)
wordrec/params_model.cpp:  if (sscanf(line + end_of_key, " %f", val) != 1)

@AlexanderHugestrand
Copy link

@hamduu I'm not sure I understand what you are asking. The file and the line that I pointed out is where the error is triggered, and should probably not be changed. And LC_NUMERIC is just an environment variable that you can set manually.

@stweil I have built and tested tesseract on Linux Mint and I have no info about specific libraries right now.

@AlexanderHugestrand
Copy link

Here is the man page, and it's pretty clear about the locale:

http://man7.org/linux/man-pages/man3/scanf.3.html

@stweil
Copy link
Contributor

stweil commented Apr 5, 2018

Linux Mint uses Debian packages, so the result should not be much different. The man page only says that LC_NUMERIC can be used to allow separators for multiples of thousand.

Here is the test scenario which I used (maybe you can try it on Linux Mint):

$ cat sscanf-test.cpp 
#include <stdio.h>

int main(int argc, char *argv[])
{
  for (int arg = 1; arg < argc; arg++) {
    float f = 0.0f;
    sscanf(argv[arg], "%f", &f);
    printf("f[%d] = %f\n", arg, f);
  }
  return 0;
}
$ g++ -std=c++11 -Wall -Wextra sscanf-test.cpp -o sscanf-test
$ ./sscanf-test 3.14
f[1] = 3.140000
$ ./sscanf-test 3,14
f[1] = 3.000000
$ LC_NUMERIC=de_DE.UTF-8 ./sscanf-test 3,14
f[1] = 3.000000

@amitdo
Copy link
Collaborator

amitdo commented Apr 5, 2018

https://en.cppreference.com/w/cpp/locale/setlocale

Here 3.14 -> 3,14

@stweil
Copy link
Contributor

stweil commented Apr 6, 2018

That's interesting. So C/C++ programs don't use the locale which was set in the environment, but start running with the "C" locale. That is exactly what I observed in my test. Only if I set LC_NUMERIC inside of my test program, I get a different behaviour.

That implies that we have no problem for the tesseract executable or the training programs which are provided by Tesseract. Nor will external software have a problem as long as it does not set LC_NUMERIC.

Maybe Java uses the environment settings to set LC_NUMERIC internally. That would explain the reported problem.

In addition to the problem with sscanf, more code is possibly affected by "wrong" locale settings, for example these lines:

classify/ocrfeatures.cpp:    fprintf (File, "%f  %f\n",
wordrec/params_model.cpp:    if (fprintf(fp, "%s %f\n", kParamsTrainingFeatureTypeName[i], weights[i])

@Shreeshrii, I don't think that all that code can be found and rewritten for 4.0.0. It would be possible to report a warning when the Tesseract initialisation code detects an unsupported locale setting.

@Shreeshrii
Copy link
Collaborator

@stweil Thanks for the investigation. Yes, please make possible changes to point the users in the right direction.

Related issue reg: locales.
#1250 (comment)

@hamduu
Copy link

hamduu commented Apr 8, 2018

screen shot 2018-04-08 at 11 05 09 am

screen shot 2018-04-08 at 11 12 49 am

sorry to ask about this in detail, i have no expertise in this.
i am trying to use this command... cntraining eng.TimesNewRoman.exp0.tr
this is the error i am getting
Error: Illegal number of feature sets!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75
Abort trap: 6

i even tried changing the locale from terminal,but i am still finding the same error. Have added the screenshots for the same.

how do i make this command work. can use please tell me where exactly do i need to make changes to set the locale differently? Please do help me out.

Regards

@suresh443
Copy link

Hi , Tesseract is working fine in main method(JAVA), but when i try to run in web application i am facing below error

**#

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f96274dbac7, pid=4516, tid=0x00007f9699212700

JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)

Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libtesseract.so.3.0.4+0x9dac7] tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+0x5e7

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:

/home/ahextech/suresh/softwares/eclipse/hs_err_pid4516.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

#**

My sample code is:
Tesseract tesseract = new Tesseract();
TesseractOCRConfig config = new TesseractOCRConfig(); config.setTessdataPath("/home/ahextech/suresh/ehubWorkspace/BITBUCKETCODE/");
config.setLanguage("Eng");
String text = tesseract.doOCR(imageProcessed);

My current Os : Ubuntu 16.04
Java JDK version 1.8
My Tomcat version : 8
Tessercat Version :3.4.4

Help me if u can

Thanks a Ton..!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants