Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault ("Speicherzugriffsfehler") caused by "fast" traineddata #2921

Closed
M3ssman opened this issue Mar 5, 2020 · 23 comments
Closed

Comments

@M3ssman
Copy link
Contributor

M3ssman commented Mar 5, 2020

Hello,

I have 2 tesstrain-installations, one on a local machine, another on a virtual VM provided by our IT-SP.

Setup contains a small sample (tif+gt) for about 10-20 lines.

The local version runs fine and produces a final model, whereas the VM fails with (excerpt from 2020-03-05-tesstrain-mem.log)

Continuing from data/frk/ulbzd.lstm
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0019.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0020.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0009.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0000.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0018.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0014.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0000_region0000_line0000.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0015.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0016.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0001_region0001_line.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0006.lstmf
Makefile:248: recipe for target 'data/ulbzd/checkpoints/ulbzd_checkpoint' failed
make: *** [data/ulbzd/checkpoints/ulbzd_checkpoint] Speicherzugriffsfehler (Speicherauszug erstellt)

Both same OS (Ubuntu 18.04.03 LTS), same tesseract () and same python(venv)+Pillow.

Any suggestions welcome!

@stweil
Copy link
Contributor

stweil commented Mar 5, 2020

There is a known problem with parallel execution which can cause an access violation at the end of a training, but that's not the case here as far as I see.

As you can reproduce the problem, it would be great if you could get a stack trace which shows the exact code location. There are two ways how to get such a stack trace:

  1. Attach a debugger (typically gdb) to the running process. The debugger will show the location when the error occurs, and you can request a stack trace.

  2. Create a core dump when the error occurs. This dump can later be analyzed with gdb. To enable core dumps, you must run ulimit -c unlimited before starting the training, but in the same shell.

Which Tesseract training binaries did you use? Those from Ubuntu or self built binaries? Which version?

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 5, 2020

tesseract: v4.1.1-rc2-21-gf4ef (from alax-p)
tesstrain: revision 6f74059

Now,
after I removed python's venv completely and did a fresh install it fails even on my local machine, but with different message, which seems to originate from tesseract itself:

tesseract /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0001_region0001_line.tif /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0001_region0001_line --psm 6 lstm.train

=>

Tesseract Open Source OCR Engine v4.1.1-rc2-21-gf4ef with Leptonica
Page 1
Warning: Invalid resolution 0 dpi. Using 70 instead.
Failed to read boxes from /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0001_region0001_line.tif
Error during processing.

Image attached (LINES_0001_region0000_region0000_line0000.zip)

@stweil
Copy link
Contributor

stweil commented Mar 5, 2020

Tesseract wants /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0001_region0001_line.box which is missing. So that is not a crash , but a normal error report.

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 5, 2020

Please forget about the last remark.
This happens if there's already a box-file present with that name, in my case an empty file.

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 5, 2020

@stweil
I've located a _usr_bin_lstmtraining.0.crash-file in /var/crash but it rather large, about 20 MB?
Looks like it's coming from lstmtraining command ...

@stweil
Copy link
Contributor

stweil commented Mar 5, 2020

That looks good. The lstmtraining caches the lstmf files, so it uses a lot of memory, and a core dump includes that memory. You can also check the time of the file or run file /var/crash/_usr_bin_lstmtraining.0.crash to see whether it is the expected core dump. Now try gdb /usr/bin/lstmtraining /var/crash/_usr_bin_lstmtraining.0.crash. If that works, try info stack from the gdb command prompt.

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 5, 2020

No luck so far:

sudo gdb /usr/bin/lstmtraining /var/crash/_usr_bin_lstmtraining.0.crash
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/lstmtraining...(no debugging symbols found)...fertig.
"/var/crash/_usr_bin_lstmtraining.0.crash" is not a core dump: Dateiformat nicht erkannt

@stweil
Copy link
Contributor

stweil commented Mar 5, 2020

It might also be a simple text file. Try file /var/crash/_usr_bin_lstmtraining.0.crash or try to open it with your editor. Maybe it contains more information. If not, a real core dump is needed (see instructions above).

@stweil
Copy link
Contributor

stweil commented Mar 5, 2020

Files is /var/crash seem to be written by corekeeper, an optional Ubuntu package. Maybe it writes compressed core files. See https://wiki.debian.org/HowToGetABacktrace how to analyze such files.

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 6, 2020

@stweil Thanks for your suggestions!

The crash-File itself contained a mix of regular ASCII-Data and Base64.
I've had to add apport-retrace and then add the entry Package: 0 after ExecutableTimestamp in the beginning ASCII-section.
Executing apport-retrace -g /var/crash/_usr_bin_lstmtraining.40266.crash, followed by info stack it says:

(gdb) info stack
#0  0x00007f77d1e5037a in tesseract::NetworkIO::Transpose(tesseract::TransposedArray*) const () from /usr/lib/x86_64-linux-gnu/libtesseract.so.4
tesseract-ocr/tesstrain#1  0x00007f77d1e377fb in tesseract::LSTM::Backward(bool, tesseract::NetworkIO const&, tesseract::NetworkScratch*, tesseract::NetworkIO*) () from /usr/lib/x86_64-linux-gnu/libtesseract.so.4
tesseract-ocr/tesstrain#2  0x00007f77d1e6054d in tesseract::Series::Backward(bool, tesseract::NetworkIO const&, tesseract::NetworkScratch*, tesseract::NetworkIO*) () from /usr/lib/x86_64-linux-gnu/libtesseract.so.4
tesseract-ocr/tesstrain#3  0x00007f77d1e45c52 in tesseract::LSTMTrainer::TrainOnLine(tesseract::ImageData const*, bool) () from /usr/lib/x86_64-linux-gnu/libtesseract.so.4
tesseract-ocr/tesstrain#4  0x0000561c673e2602 in ?? ()
tesseract-ocr/tesstrain#5  0x00007f77d08a5b97 in __libc_start_main (main=0x561c673e1fd0, argc=17, argv=0x7ffc2e2b1798, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc2e2b1788)
    at ../csu/libc-start.c:310
tesseract-ocr/tesstrain#6  0x0000561c673e300a in ?? ()

Hope this helps!

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 6, 2020

There's also a slightly difference regarding weight:

(local working example)

Config file is optional, continuing...
Failed to read data from: data/ulbzd_latest/ulbzd_latest.config
Null char=2
lstmtraining \
  --debug_interval 0 \
  --traineddata data/ulbzd_latest/ulbzd_latest.traineddata \
  --old_traineddata /usr/share/tesseract-ocr/4.00/tessdata/frk.traineddata \
  --continue_from data/frk/ulbzd_latest.lstm \
  --model_output data/ulbzd_latest/checkpoints/ulbzd_latest \
  --train_listfile data/ulbzd_latest/list.train \
  --eval_listfile data/ulbzd_latest/list.eval \
  --max_iterations 10000
Loaded file data/frk/ulbzd_latest.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 99!
Num (Extended) outputs,weights in Series:
  1,48,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys64:64, 20736
  Lfx96:96, 61824
  Lrx96:96, 74112
  Lfx384:384, 738816
  Fc99:99, 38115
Total weights = 933763
Previous null char=98 mapped to 98
Continuing from data/frk/ulbzd_latest.lstm
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0019.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0020.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0009.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0014.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0000.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0018.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0015.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0001_region0001_line.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0016.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0000_region0000_line0000.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0006.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0021.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0004.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0022.lstmf
Loaded 1/1 lines (1-1) of document /home/hartwig/Projekte/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0005.lstmf
2 Percent improvement time=33, best error was 100 @ 0
At iteration 33/100/100, Mean rms=0.528%, delta=0.649%, char train=1.912%, word train=9.581%, skip ratio=0%,  New best char error = 1.912 Transitioned to stage 1 wrote best model:data/ulbzd_latest/checkpoints/ulbzd_latest1.912_33.checkpoint wrote checkpoint.

(VM broken output)

Config file is optional, continuing...
Failed to read data from: data/ulbzd_latest/ulbzd_latest.config
Null char=2
lstmtraining \
  --debug_interval 0 \
  --traineddata data/ulbzd_latest/ulbzd_latest.traineddata \
  --old_traineddata /usr/share/tesseract-ocr/4.00/tessdata/frk.traineddata \
  --continue_from data/frk/ulbzd_latest.lstm \
  --model_output data/ulbzd_latest/checkpoints/ulbzd_latest \
  --train_listfile data/ulbzd_latest/list.train \
  --eval_listfile data/ulbzd_latest/list.eval \
  --max_iterations 10000
Loaded file data/frk/ulbzd_latest.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 99!
Num (Extended) outputs,weights in Series:
  1,48,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys64:64, 20736
  Lfx96:96, 61824
  Lrx96:96, 74112
  Lfx384:384, 738816
  Fc99:99, 0
Total weights = 895648
Previous null char=98 mapped to 98
Continuing from data/frk/ulbzd_latest.lstm
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0019.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0020.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0018.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0000.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0009.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0014.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0000_region0000_line0000.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0015.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0016.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0001_region0001_line.lstmf
Loaded 1/1 lines (1-1) of document /home/gitlab-runner/builds/tfDssL8r/0/aqayv/ulb-dd-ocr-training/data/ulbzd_latest/LINES_0001_region0002_region0002_line0006.lstmf
Makefile:248: recipe for target 'data/ulbzd_latest/checkpoints/ulbzd_latest_checkpoint' failed
make: *** [data/ulbzd_latest/checkpoints/ulbzd_latest_checkpoint] Speicherzugriffsfehler (Speicherauszug erstellt)

Since the training data itself is in both environments the same, I wonder why the weights differ and what Fc stands for.

@stweil
Copy link
Contributor

stweil commented Mar 6, 2020

Fc stands for a network type (see NT_SOFTMAX in the code). The different number of weights is indeed strange. I wonder what is special with that virtual machine. Is it a 32 bit OS, and the others are 64 bit? You could also compare the output from ldd PATH/lstmtraining to see whether the expected libraries are used.

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 7, 2020

@stweil
I tried to run it on a real Ubuntu Laptop (no VM, core i7, 16 GB RAM):

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Fails with same error message. On this Laptop I followed your lateset recommendation to execute ldd:

ldd /usr/bin/lstmtraining =>
linux-vdso.so.1 (0x00007ffc5c7e1000)
libtesseract.so.4 => /usr/lib/x86_64-linux-gnu/libtesseract.so.4 (0x00007f9d36a37000)
liblept.so.5 => /usr/lib/x86_64-linux-gnu/liblept.so.5 (0x00007f9d365be000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9d3639f000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f9d36016000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9d35c78000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f9d35a60000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9d3566f000)
libarchive.so.13 => /usr/lib/x86_64-linux-gnu/libarchive.so.13 (0x00007f9d353bf000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f9d35190000)
libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f9d34f5e000)
libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007f9d34cf6000)
libgif.so.7 => /usr/lib/x86_64-linux-gnu/libgif.so.7 (0x00007f9d34aed000)
libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007f9d34876000)
libwebp.so.6 => /usr/lib/x86_64-linux-gnu/libwebp.so.6 (0x00007f9d3460d000)
libopenjp2.so.7 => /usr/lib/x86_64-linux-gnu/libopenjp2.so.7 (0x00007f9d343b7000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f9d3419a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9d3717f000)
libnettle.so.6 => /usr/lib/x86_64-linux-gnu/libnettle.so.6 (0x00007f9d33f64000)
libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007f9d33d5c000)
liblzo2.so.2 => /lib/x86_64-linux-gnu/liblzo2.so.2 (0x00007f9d33b3a000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f9d33914000)
liblz4.so.1 => /usr/lib/x86_64-linux-gnu/liblz4.so.1 (0x00007f9d336f8000)
libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f9d334e8000)
libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f9d33127000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9d32f23000)
libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007f9d32d15000)
libattr.so.1 => /lib/x86_64-linux-gnu/libattr.so.1 (0x00007f9d32b10000)
libicuuc.so.60 => /usr/lib/x86_64-linux-gnu/libicuuc.so.60 (0x00007f9d32759000)
libicudata.so.60 => /usr/lib/x86_64-linux-gnu/libicudata.so.60 (0x00007f9d30bb0000)

There must be some arcane dependency missing. I didn' build tesseract myself, I used the version straight from alex-p on this machine. I'm not sure whether I build tesseract on my office-pc. I'll take a look monday at work.

@stweil
Copy link
Contributor

stweil commented Mar 7, 2020

Then it should be possible for me to reproduce the problem. Can you provide the necessary files (maybe the whole data directory)?

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 7, 2020

Sure I can: ulb-dd-ocr-training.zip

Please notes: The main Script is run-local.sh. The main folder is usually a git-repo, but I removed the git-stuff to reduce size. Second, the official tesstrain github-repo is considered to be a git submodule of this project. Removed that to save space, too. It's the same as this project.

The Data I've used is located in the data-main-folder.

Maybe you can place this data inside a container or otherwise fresh and clean environment to reproduce the error.

@stweil
Copy link
Contributor

stweil commented Mar 7, 2020

I'd like to directly run lstmtraining to avoid additional dependencies.

Still missing for that: data/frk/ulbzd_latest.lstm, data/ulbzd_latest/list.eval and data/ulbzd_latest/list.train.

@stweil
Copy link
Contributor

stweil commented Mar 7, 2020

After installation of tesstrain, I am now able to reproduce the crash, thanks.

@stweil stweil changed the title Q: What can cause a "Speicherzugriffsfehler" Q: What can cause a segmentation fault ("Speicherzugriffsfehler") Mar 10, 2020
@stweil
Copy link
Contributor

stweil commented Mar 10, 2020

I think that I found the reason why the problem occurs on some machines while others work fine.

Training uses "best" traineddata files (LSTM weights in double precision / 8 byte). The training here starts with /usr/share/tesseract-ocr/4.00/tessdata/frk.traineddata, so that file must be downloaded from https://github.com/tesseract-ocr/tessdata_best/raw/master/frk.traineddata. If this is done before starting the training, everything works fine.

Debian / Ubuntu provide a package tesseract-ocr-frk which also installs /usr/share/tesseract-ocr/4.00/tessdata/frk.traineddata, but that is a "fast" traineddata file (LSTM weights are one byte integers). It cannot be used for training and causes the segmentation fault.

Tesseract should be fixed to handle the wrong kind of traineddata with a user friendly error message instead of crashing.

@stweil
Copy link
Contributor

stweil commented Mar 10, 2020

This is a bug (missing handling for wrong input data) in the Tesseract code, therefore I transfer the issue report to tesseract-ocr/tesseract.

@stweil stweil transferred this issue from tesseract-ocr/tesstrain Mar 10, 2020
@stweil stweil changed the title Q: What can cause a segmentation fault ("Speicherzugriffsfehler") Segmentation fault ("Speicherzugriffsfehler") caused by wrong kind of traineddata Mar 10, 2020
@stweil stweil changed the title Segmentation fault ("Speicherzugriffsfehler") caused by wrong kind of traineddata Segmentation fault ("Speicherzugriffsfehler") caused by "fast" traineddata Mar 10, 2020
@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 13, 2020

@stweil
Thanks very much!
Great Job!
I can confirm that it works on my local machine the way you've pointed out!

@M3ssman
Copy link
Contributor Author

M3ssman commented Mar 16, 2020

@stweil
It also breaks with frk.traineddata model from https://github.com/tesseract-ocr/tessdata/blob/master/frk.traineddata which is quite large (up to 21 MB)

One has really stick to the fractur-Model from https://github.com/tesseract-ocr/tessdata_best/raw/master/frk.traineddata

@stweil
Copy link
Contributor

stweil commented Mar 18, 2020

That's expected behaviour. Like tessdata_fast, tessdata contains fast models which don't work with training. In addition most of those models also contain data for the old recognizer.

@amitdo
Copy link
Collaborator

amitdo commented Apr 24, 2020

Tesseract should be fixed to handle the wrong kind of traineddata with a user friendly error message instead of crashing.

Duplicate of #1573.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants