-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when using integer models for LSTM training #1573
Comments
Both tessdata and tessdata_fast have integer models which cannot be used for lstmtraining.. Of course, it should give an appropriate error message and not crash. @stweil Is it possible to add an error msg for 4.0.0? |
Thanks for your response, Shreeshrii, I did read some comments on the integerize in your documentations and should have guessed this. Still, is there a way to integerize the fine tuned model from the tessdata_best ? The speed of the model on tessdata_best is too slow for our application. Dihui |
The best files can be converted to integer by the following command
The tessdata repo has the integer version of best models plus the old legacy model also. |
@DihuiLai Please change issue title to Segmentation fault when using integer models for LSTM training |
s/trining/training/
Yes, I think so. I added the issue to the planning list. @zdenop, please add the "bug" label to this issue. |
@stweil Thanks for fixing the typo :-) Good to know that it can be fixed for 4.0.0. |
Changed @Shreeshrii |
The problem is solved and I am closing the issue |
AFAIK this issue was not solved. |
It was only clarified that it was caused by training based on an integer model which is not allowed. Although this is a bug, I think it can be fixed after 4.0.0, as training won't be done by most users of Tesseract. |
@stweil : can you send PR, so we can fix this for 4.0 release? |
Tesseract currently cannot continue LSTM training from an integer (fast) model. Report this to users who try it nevertheless instead of crashing with an assertion. Signed-off-by: Stefan Weil <[email protected]>
Abort LSTM training with integer model (fixes issue #1573)
Tesseract currently cannot continue LSTM training from an integer (fast) model. Report this to users who try it nevertheless instead of crashing with an assertion. Signed-off-by: Stefan Weil <[email protected]>
I am running the tutorial on training lstm by fine tuning it following the link https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact
The training works OK when I follow the tutorial instruction and fine tune from .lstm extracted from tessdata/best/eng.traineddata. However the training failed when I try to extract .lstm from tessdata/eng.traineddata
Environment
Tesseract Version: tesseract 4.0.0-beta.1-232-g45a6
Platform: <ubuntu 16.04>
The code I am trying to execute:
training/lstmtraining --model_output ~/tesstutorial/impact_from_full/impact --continue_from ~/tesstutorial/impact_from_full/eng.lstm --traineddata tessdata/eng.traineddata --train_listfile ~/tesstutorial/engeval/eng.training_files.txt --max_iterations 400
The eng.lstm is extracted by "training/combine_tessdata -e tessdata/eng.traineddata ~/tesstutorial/impact_from_full/eng.lstm"
The code will work if I use the tessdata/best/eng.traineddata
The error that I got:
Loaded file /home/dlai/tesstutorial/impact_from_full/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/dlai/tesstutorial/impact_from_full/eng.lstm
Loaded 72/72 pages (1-72) of document /home/dlai/tesstutorial/engeval/eng.FreeSans.exp0.lstmf
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Segmentation fault (core dumped)
Thanks very much
Dihui
The text was updated successfully, but these errors were encountered: