Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract Segmentation fault during fine tuning with fast traineddata #2255

Closed
ajinkya933 opened this issue Feb 19, 2019 · 4 comments
Closed

Comments

@ajinkya933
Copy link

ajinkya933 commented Feb 19, 2019

Environment

  • Tesseract Version: 4.0.0-324-gb67f

  • Platform: Ubuntu16.04 64-bit

Current Behavior:

I am following TrainingTesseract 4.00 tutorial (https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00). I am on a point where you Fine Tune for ± a few characters. This is the error that I am observing:

ajinkya@ajinkya-H310M-S2:~/Documents/tesseract$ src/training/lstmtraining --model_output ~/tesstutorial/trainplusminus/plusminus --continue_from ~/tesstutorial/trainplusminus/eng.lstm --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata --old_traineddata tessdata/best/eng.traineddata --train_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt --max_iterations 3600
Loaded file /home/ajinkya/tesstutorial/trainplusminus/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 111!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys64:64, 20736
Lfx96:96, 61824
Lrx96:96, 74112
Lfx512:512, 1247232
Fc111:111, 0
Total weights = 1404064
Previous null char=110 mapped to 110
Continuing from /home/ajinkya/tesstutorial/trainplusminus/eng.lstm
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Arial_Bold.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Arial_Bold_Italic.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Arial.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Century_Schoolbook_L_Bold_Italic.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Arial_Italic.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Century_Schoolbook_L_Bold.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Century_Schoolbook_L_Italic.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Courier_New_Bold.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/ajinkya/tesstutorial/trainplusminus/eng.Century_Schoolbook_L_Medium.exp0.lstmf
Segmentation fault (core dumped)

In order to fix this error I have referred (#1447) issue.
Ive tried 1) downloading data from wget from https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata and placed this data in tessdata/best/eng.traineddata, however this does not resolve the error

Let me know if there is anything else I should do to fix this error?

Expected Behavior:

Suggested Fix:

@ajinkya933 ajinkya933 changed the title Segmentation fault (core dumped) Tesseract Segmentation fault (core dumped) during fine tuning Feb 19, 2019
@stweil
Copy link
Contributor

stweil commented Feb 19, 2019

You'll need https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata.

It is a known bug that Tesseract crashes when training with fast models (either from tessdata_fast or tessdata) is tried. See issue #1573.

@stweil stweil added the bug label Feb 19, 2019
@stweil stweil changed the title Tesseract Segmentation fault (core dumped) during fine tuning Tesseract Segmentation fault during fine tuning with fast traineddata Feb 19, 2019
@ajinkya933
Copy link
Author

as directed I placed above file in tessdata/best/eng.traineddata . However the error persists. Should I redo the entire steps from beginning of the tutorial ? to solve this problem

@stweil
Copy link
Contributor

stweil commented Feb 19, 2019

Ive tried 1) downloading data from wget from https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata [...]

That's the wrong download link.

@ajinkya933
Copy link
Author

I solved this error by

A) Deleting everything and then reinstalling tesseract from here:
git clone https://github.com/tesseract-ocr/tesseract.git

B) cd tesseract/tessdata

C) sudo mkdir best

D) cd best

E) wget https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/heb.traineddata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/chi_sim.traineddata

Now you have all the correct stuff needed to run this tutorial. Geez !! the files required to run this tutorial are all over the world. Someone please combine this into a single github repo wherein you can git clone and follow the instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants