-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault ("Speicherzugriffsfehler") caused by "fast" traineddata #2921
Comments
There is a known problem with parallel execution which can cause an access violation at the end of a training, but that's not the case here as far as I see. As you can reproduce the problem, it would be great if you could get a stack trace which shows the exact code location. There are two ways how to get such a stack trace:
Which Tesseract training binaries did you use? Those from Ubuntu or self built binaries? Which version? |
tesseract: Now,
Image attached (LINES_0001_region0000_region0000_line0000.zip) |
Tesseract wants |
Please forget about the last remark. |
@stweil |
That looks good. The |
No luck so far:
|
It might also be a simple text file. Try |
Files is |
@stweil Thanks for your suggestions! The crash-File itself contained a mix of regular ASCII-Data and Base64.
Hope this helps! |
There's also a slightly difference regarding weight: (local working example)
(VM broken output)
Since the training data itself is in both environments the same, I wonder why the weights differ and what |
|
@stweil
Fails with same error message. On this Laptop I followed your lateset recommendation to execute
There must be some arcane dependency missing. I didn' build tesseract myself, I used the version straight from alex-p on this machine. I'm not sure whether I build tesseract on my office-pc. I'll take a look monday at work. |
Then it should be possible for me to reproduce the problem. Can you provide the necessary files (maybe the whole data directory)? |
Sure I can: ulb-dd-ocr-training.zip Please notes: The main Script is The Data I've used is located in the Maybe you can place this data inside a container or otherwise fresh and clean environment to reproduce the error. |
I'd like to directly run Still missing for that: |
After installation of tesstrain, I am now able to reproduce the crash, thanks. |
I think that I found the reason why the problem occurs on some machines while others work fine. Training uses "best" traineddata files (LSTM weights in double precision / 8 byte). The training here starts with Debian / Ubuntu provide a package Tesseract should be fixed to handle the wrong kind of traineddata with a user friendly error message instead of crashing. |
This is a bug (missing handling for wrong input data) in the Tesseract code, therefore I transfer the issue report to tesseract-ocr/tesseract. |
@stweil |
@stweil One has really stick to the fractur-Model from https://github.com/tesseract-ocr/tessdata_best/raw/master/frk.traineddata |
That's expected behaviour. Like |
Duplicate of #1573. |
Hello,
I have 2 tesstrain-installations, one on a local machine, another on a virtual VM provided by our IT-SP.
Setup contains a small sample (tif+gt) for about 10-20 lines.
The local version runs fine and produces a final model, whereas the VM fails with (excerpt from 2020-03-05-tesstrain-mem.log)
Both same OS (Ubuntu 18.04.03 LTS), same tesseract () and same python(venv)+Pillow.
Any suggestions welcome!
The text was updated successfully, but these errors were encountered: