Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM:Training - Error - ccutil/genericvector.h:696: core dumped #561

Closed
Shreeshrii opened this issue Dec 11, 2016 · 17 comments
Closed

LSTM:Training - Error - ccutil/genericvector.h:696: core dumped #561

Shreeshrii opened this issue Dec 11, 2016 · 17 comments

Comments

@Shreeshrii
Copy link
Collaborator

$ lstmtraining -U ~/tesstutorial/sanlayer/san.unicharset   --script_dir ../langdata  --debug_interval
0   --continue_from ~/tesstutorial/san_from_layer/san.lstm   --append_index 5 --
net_spec '[Lfx256 O1c105]'   --model_output ~/tesstutorial/san_from_layer/base
 --train_listfile ~/tesstutorial/sanlayer/san.training_files.txt   --eval_listfi
le ~/tesstutorial/saneval/san.training_files.txt   --max_iterations 5000
Loaded file /home/shree/tesstutorial/san_from_layer/base_checkpoint, unpacking..
.
Successfully restored trainer from /home/shree/tesstutorial/san_from_layer/base_checkpoint
Loaded 2094/2094 pages (0-2094) of document /home/shree/tesstutorial/sanlayer/san.Chandas.exp0.lstmf
Loaded 691/691 pages (0-691) of document /home/shree/tesstutorial/saneval/san.Aksharyogini2.exp0.lstmf
Loaded 2104/2104 pages (0-2104) of document /home/shree/tesstutorial/sanlayer/san.Gargi.exp0.lstmf
Loaded 2103/2103 pages (0-2103) of document /home/shree/tesstutorial/sanlayer/san.Sahadeva.exp0.lstmf
Loaded 691/691 pages (0-691) of document /home/shree/tesstutorial/saneval/san.Amiko.exp0.lstmf
Loaded 2101/2101 pages (0-2101) of document /home/shree/tesstutorial/sanlayer/san.Nakula.exp0.lstmf
Loaded 2103/2103 pages (0-2103) of document /home/shree/tesstutorial/sanlayer/san.Lohit_Devanagari.exp0.lstmf
Loaded 2102/2102 pages (0-2102) of document /home/shree/tesstutorial/sanlayer/san.Sarai.exp0.lstmf
Loaded 2102/2102 pages (0-2102) of document /home/shree/tesstutorial/sanlayer/san.Samanata.exp0.lstmf
Loaded 2096/2096 pages (0-2096) of document /home/shree/tesstutorial/sanlayer/san.Santipur_OT_Medium.exp0.lstmf
Loaded 2102/2102 pages (0-2102) of document /home/shree/tesstutorial/sanlayer/san.Kalimati.exp0.lstmf
Loaded 2068/2103 pages (35-2103) of document /home/shree/tesstutorial/sanlayer/san.Siddhanta-Calcutta.exp0.lstmf
Loaded 2062/2097 pages (35-2097) of document /home/shree/tesstutorial/sanlayer/san.Uttara.exp0.lstmf
Loaded 2064/2099 pages (35-2099) of document /home/shree/tesstutorial/sanlayer/san.Siddhanta.exp0.lstmf
Found AVX
Found SSE
Loaded 2065/2100 pages (35-2100) of document /home/shree/tesstutorial/sanlayer/san.Siddhanta-Nepali.exp0.lstmf
Loaded 2064/2100 pages (36-2100) of document /home/shree/tesstutorial/sanlayer/san.Siddhanta-cakravat.exp0.lstmf

At iteration 600/600/600, Mean rms=0.899%, delta=49.539%, char train=102.678%, word train=100%, skip ratio=0%,  New worst char error = 102.678 wrote checkpoint.

At iteration 700/700/700, Mean rms=0.895%, delta=49.137%, char train=102.295%, word train=100%, skip ratio=0%,  New worst char error = 102.295 wrote checkpoint.

At iteration 800/800/800, Mean rms=0.893%, delta=48.75%, char train=102.008%, word train=100%, skip ratio=0%,  New worst char error = 102.008 wrote checkpoint.

At iteration 900/900/900, Mean rms=0.89%, delta=48.199%, char train=101.785%, word train=100%, skip ratio=0%,  New worst char error = 101.785 wrote checkpoint.

lstmtraining: ../ccutil/genericvector.h:696: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

@theraysmith
Copy link
Contributor

Can you provide a stack trace?

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Dec 19, 2016

gdb --args
lstmtraining -U ~/tesstutorial/vedic/san.unicharset
--script_dir ../langdata --debug_interval 0
--continue_from ~/tesstutorial/san_vedic/san.lstm
--append_index 5 --net_spec '[Lfx384 O1c6000]'
--model_output ~/tesstutorial/san_vedic/base
--train_listfile ~/tesstutorial/nonvedic/san.training_files.txt
--eval_listfile ~/tesstutorial/nonvedic/san.training_files.txt
--max_iterations 50000

At iteration 900/900/900, Mean rms=0.841%, delta=46.594%, char train=101.075%, word train=100%, skip ratio=0%,  New worst char error = 101.075 wrote checkpoint.
[Thread 0x7f47352d0700 (LWP 422) exited]
[New Thread 0x7f47352d0700 (LWP 423)]
lstmtraining: ../ccutil/genericvector.h:696: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f47352d0700 (LWP 423)]
0x00007f473f626c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007f473f626c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f473f62a028 in __GI_abort () at abort.c:89
#2  0x00007f473f61fbf6 in __assert_fail_base (fmt=0x7f473f7703b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7f47404f9590 "index >= 0 && index < size_used_", file=file@entry=0x7f47404f8fc8 "../ccutil/genericvector.h", line=line@entry=696,
    function=function@entry=0x7f474051cf20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92
#3  0x00007f473f61fca2 in __GI___assert_fail (assertion=0x7f47404f9590 "index >= 0 && index < size_used_", file=0x7f47404f8fc8 "../ccutil/genericvector.h", line=696,
    function=0x7f474051cf20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
#4  0x00007f4740457553 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0) at ../ccutil/genericvector.h:696
#5  0x00007f4740457d28 in operator[] (this=0x7fffca010860, this=0x7fffca010860, index=0) at lstmtrainer.cpp:919
#6  tesseract::LSTMTrainer::ReadTrainingDump (this=this@entry=0x7f47352cf570, data=..., trainer=trainer@entry=0x7f47352cf570) at lstmtrainer.cpp:920
#7  0x000000000040b4fe in tesseract::LSTMTester::RunEvalSync (this=this@entry=0x7fffca0107f0, iteration=0, training_errors=<optimized out>, model_data=...,
    training_stage=0) at lstmtester.cpp:87
#8  0x000000000040ba39 in tesseract::LSTMTester::ThreadFunc (lstmtester_void=0x7fffca0107f0) at lstmtester.cpp:124
#9  0x00007f473d4f8184 in start_thread (arg=0x7f47352d0700) at pthread_create.c:312
#10 0x00007f473f6ea37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) frame 2
#2  0x00007f473f61fbf6 in __assert_fail_base (fmt=0x7f473f7703b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7f47404f9590 "index >= 0 && index < size_used_", file=file@entry=0x7f47404f8fc8 "../ccutil/genericvector.h", line=line@entry=696,
    function=function@entry=0x7f474051cf20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92
92      in assert.c
(gdb) frame 3
#3  0x00007f473f61fca2 in __GI___assert_fail (assertion=0x7f47404f9590 "index >= 0 && index < size_used_", file=0x7f47404f8fc8 "../ccutil/genericvector.h", line=696,
    function=0x7f474051cf20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
101     in assert.c
(gdb) frame 4
#4  0x00007f4740457553 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0) at ../ccutil/genericvector.h:696
696       assert(index >= 0 && index < size_used_);
(gdb) frame 5
#5  0x00007f4740457d28 in operator[] (this=0x7fffca010860, this=0x7fffca010860, index=0) at lstmtrainer.cpp:919
919                                        LSTMTrainer* trainer) {
(gdb) frame 6
#6  tesseract::LSTMTrainer::ReadTrainingDump (this=this@entry=0x7f47352cf570, data=..., trainer=trainer@entry=0x7f47352cf570) at lstmtrainer.cpp:920
920       return trainer->ReadSizedTrainingDump(&data[0], data.size());
(gdb) frame 7
#7  0x000000000040b4fe in tesseract::LSTMTester::RunEvalSync (this=this@entry=0x7fffca0107f0, iteration=0, training_errors=<optimized out>, model_data=...,
    training_stage=0) at lstmtester.cpp:87
87        if (!trainer.ReadTrainingDump(model_data, &trainer)) {
(gdb) frame 8
#8  0x000000000040ba39 in tesseract::LSTMTester::ThreadFunc (lstmtester_void=0x7fffca0107f0) at lstmtester.cpp:124
124       lstmtester->test_result_ = lstmtester->RunEvalSync(
(gdb) frame 9
#9  0x00007f473d4f8184 in start_thread (arg=0x7f47352d0700) at pthread_create.c:312
312     pthread_create.c: No such file or directory.
(gdb) frame 10
#10 0x00007f473f6ea37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
111     ../sysdeps/unix/sysv/linux/x86_64/clone.S: No such file or directory. 
  

@Shreeshrii
Copy link
Collaborator Author

Still getting this error with latest build with --enable-debug

Iteration 699: ALIGNED TRUTH : नियन कुदकि के टपि जा ... हइऽऽऽयाँ ! पैदल तऽ हइयाँ, बकि
Iteration 699: BEST OCR TEXT : नियन कदक के पि जा हाल त हइया बकि
File /tmp/tmp.21BVvgsmzO/bih/bih.SakalBharati.exp0.lstmf page 28 :
Mean rms=4.633%, delta=42.555%, train=94.938%(98.768%), skip ratio=0%
[Thread 0x7f5cbb4d0700 (LWP 17917) exited]
[New Thread 0x7f5cbb4d0700 (LWP 17918)]
lstmtraining: ../ccutil/genericvector.h:697: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f5cbb4d0700 (LWP 17918)]
0x00007f5cc0486c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007f5cc0486c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f5cc048a028 in __GI_abort () at abort.c:89
#2  0x00007f5cc047fbf6 in __assert_fail_base (fmt=0x7f5cc05d03b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7f5cc130e4a0 "index >= 0 && index < size_used_", file=file@entry=0x7f5cc130df28 "../ccutil/genericvector.h", line=line@entry=697,
    function=function@entry=0x7f5cc132f480 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92
#3  0x00007f5cc047fca2 in __GI___assert_fail (assertion=0x7f5cc130e4a0 "index >= 0 && index < size_used_", file=0x7f5cc130df28 "../ccutil/genericvector.h", line=697,
    function=0x7f5cc132f480 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
#4  0x00007f5cc128fb63 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0) at ../ccutil/genericvector.h:697
#5  0x00007f5cc1290318 in operator[] (this=0x7fffd06bde40, this=0x7fffd06bde40, index=0) at lstmtrainer.cpp:920
#6  tesseract::LSTMTrainer::ReadTrainingDump (this=this@entry=0x7f5cbb4cf640, data=..., trainer=trainer@entry=0x7f5cbb4cf640) at lstmtrainer.cpp:921
#7  0x000000000040b03e in tesseract::LSTMTester::RunEvalSync (this=this@entry=0x7fffd06bddd0, iteration=0, training_errors=<optimized out>, model_data=...,
    training_stage=0) at lstmtester.cpp:86
#8  0x000000000040b539 in tesseract::LSTMTester::ThreadFunc (lstmtester_void=0x7fffd06bddd0) at lstmtester.cpp:123
#9  0x00007f5cbe348184 in start_thread (arg=0x7f5cbb4d0700) at pthread_create.c:312
#10 0x00007f5cc054a37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

(gdb) frame 1
#1  0x00007f5cc048a028 in __GI_abort () at abort.c:89
89      abort.c: No such file or directory.
(gdb) frame 2
#2  0x00007f5cc047fbf6 in __assert_fail_base (fmt=0x7f5cc05d03b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7f5cc130e4a0 "index >= 0 && index < size_used_", file=file@entry=0x7f5cc130df28 "../ccutil/genericvector.h", line=line@entry=697,
    function=function@entry=0x7f5cc132f480 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92
92      assert.c: No such file or directory.
(gdb) frame 3
#3  0x00007f5cc047fca2 in __GI___assert_fail (assertion=0x7f5cc130e4a0 "index >= 0 && index < size_used_", file=0x7f5cc130df28 "../ccutil/genericvector.h", line=697,
    function=0x7f5cc132f480 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
101     in assert.c
(gdb) frame 4
#4  0x00007f5cc128fb63 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0) at ../ccutil/genericvector.h:697
697       assert(index >= 0 && index < size_used_);
(gdb) frame 5
#5  0x00007f5cc1290318 in operator[] (this=0x7fffd06bde40, this=0x7fffd06bde40, index=0) at lstmtrainer.cpp:920
920                                        LSTMTrainer* trainer) {
(gdb) frame 6
#6  tesseract::LSTMTrainer::ReadTrainingDump (this=this@entry=0x7f5cbb4cf640, data=..., trainer=trainer@entry=0x7f5cbb4cf640) at lstmtrainer.cpp:921
921       return trainer->ReadSizedTrainingDump(&data[0], data.size());
(gdb) frame 7
#7  0x000000000040b03e in tesseract::LSTMTester::RunEvalSync (this=this@entry=0x7fffd06bddd0, iteration=0, training_errors=<optimized out>, model_data=...,
    training_stage=0) at lstmtester.cpp:86
86        if (!trainer.ReadTrainingDump(model_data, &trainer)) {
(gdb) frame 8
#8  0x000000000040b539 in tesseract::LSTMTester::ThreadFunc (lstmtester_void=0x7fffd06bddd0) at lstmtester.cpp:123
123       lstmtester->test_result_ = lstmtester->RunEvalSync(
(gdb) frame 9
#9  0x00007f5cbe348184 in start_thread (arg=0x7f5cbb4d0700) at pthread_create.c:312
312     pthread_create.c: No such file or directory.
(gdb)

@theraysmith
Copy link
Contributor

theraysmith commented May 8, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Still getting the error

tesseract -v
tesseract 4.00.00alpha-460-gb86b4fa
 leptonica-1.74.1
  libgif 5.0.5 : libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : libwebp 0.4.0 : libopenjp2 2.1.0

 Found AVX
 Found SSE
Iteration 699: ALIGNED TRUTH : कृषिक्षॆत्रंचित्रपटाने साथ नयी पोस्टेड से ओलंपिक सेभोजनादिकम् सीप
Iteration 699: BEST OCR TEXT : किससिसिस
File /tmp/tmp.FOELAYZPOv/bih/bih.Samanata.exp0.lstmf page 215 :
Mean rms=4.813%, delta=46.035%, train=99.32%(99.97%), skip ratio=0%
[Thread 0x7f618d0e0700 (LWP 170) exited]
[New Thread 0x7f618d0e0700 (LWP 171)]
lstmtraining: ../ccutil/genericvector.h:697: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f618d0e0700 (LWP 171)]
0x00007f6191886c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007f6191886c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f619188a028 in __GI_abort () at abort.c:89
#2  0x00007f619187fbf6 in __assert_fail_base (fmt=0x7f61919d03b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7f619270d6c0 "index >= 0 && index < size_used_", file=file@entry=0x7f619270d148 "../ccutil/genericvector.h", line=line@entry=697,
    function=function@entry=0x7f619272ea60 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92
#3  0x00007f619187fca2 in __GI___assert_fail (assertion=0x7f619270d6c0 "index >= 0 && index < size_used_", file=0x7f619270d148 "../ccutil/genericvector.h", line=697,
    function=0x7f619272ea60 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
#4  0x00007f619268c803 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0) at ../ccutil/genericvector.h:697
#5  0x00007f619268cfb8 in operator[] (this=0x7fffe8f33fc0, this=0x7fffe8f33fc0, index=0) at lstmtrainer.cpp:920
#6  tesseract::LSTMTrainer::ReadTrainingDump (this=this@entry=0x7f618d0df640, data=..., trainer=trainer@entry=0x7f618d0df640) at lstmtrainer.cpp:921
#7  0x000000000040b03e in tesseract::LSTMTester::RunEvalSync (this=this@entry=0x7fffe8f33f50, iteration=0, training_errors=<optimized out>, model_data=...,
    training_stage=0) at lstmtester.cpp:86
#8  0x000000000040b539 in tesseract::LSTMTester::ThreadFunc (lstmtester_void=0x7fffe8f33f50) at lstmtester.cpp:123
#9  0x00007f618f748184 in start_thread (arg=0x7f618d0e0700) at pthread_create.c:312
#10 0x00007f619194a37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) quit

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 9, 2017

I tried with a different set of input files for training and eval, same error. Seems to happen at first checkpoint writing after train % falls below 100.

Iteration 496: ALIGNED TRUTH : ग़, ज़, और फ़। इसलिए आपको केवल इन पाँचों पर ध्यान
Iteration 496: BEST OCR TEXT :  ज र  सलि आपको केवल न पावो पर ध्यान
File /tmp/tmp.VitmU3pEv2/bih/bih.Mangal.exp0.lstmf page 1889 :
Mean rms=4.735%, delta=39.487%, train=92.02%(98.26%), skip ratio=0%
Iteration 497: ALIGNED TRUTH : वर्गमीटर मरे यथाशक्ति न्यायशास्त्री फैक्स। प्रेममयी सोद्देश्यवादी
Iteration 497: BEST OCR TEXT : वमीदर मरे यधाशि नयायशा कस पेममय सोदियवादी
File /tmp/tmp.VitmU3pEv2/bih/bih.Siddhanta.exp0.lstmf page 747 :
Mean rms=4.731%, delta=39.423%, train=91.922%(98.234%), skip ratio=0%
Iteration 498: ALIGNED TRUTH : महीपतिया हो जुआर जसे हारे तइसे ले गआव मोरा पास जहंवा
Iteration 498: BEST OCR TEXT : महीपतिया हो जआर जसे हार तसे ले व मोरा पास जहवा
File /tmp/tmp.VitmU3pEv2/bih/bih.Mangal.exp0.lstmf page 30 :
Mean rms=4.726%, delta=39.356%, train=91.76%(98.129%), skip ratio=0%
Iteration 499: ALIGNED TRUTH : रद्दा। रार धडधड धड घ्रघ्र धडल्ला घ्रघर्र दरारदार ऋणाधार
Iteration 499: BEST OCR TEXT : रदा रार धध ध धरधर ला धरधर दरारदार णाधार
File /tmp/tmp.VitmU3pEv2/bih/bih.Siddhanta.exp0.lstmf page 1157 :
Mean rms=4.721%, delta=39.3%, train=91.663%(98.088%), skip ratio=0%
lstmtraining: ../ccutil/genericvector.h:697: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

@Shreeshrii
Copy link
Collaborator Author

Checked with eng, same set of input files. fails when I give the --eval_listfile.

Only train_listfile - NO ERROR

 lstmtraining  \
>    -U ~/tesstutorial/nyd/eng.unicharset \
>   --train_listfile ~/tesstutorial/nyd/eng.training_files.txt \
>   --script_dir ../langdata   \
>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>   --continue_from ~/tesstutorial/nydlayer/eng.lstm \
>   --model_output ~/tesstutorial/nydlayer/nyd \
>   --debug_interval -1 \
>   --target_error_rate 0.01
Loaded file /home/shree/tesstutorial/nydlayer/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tesstutorial/nydlayer/eng.lstm
Other case É of é is not in unicharset
Appending a new network to an old one!!Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Num outputs,weights in serial:
  Lfx256:256, 394240
  Fc105:105, 26985
Total weights = 421225
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc105] from request [Lfx256 O1c105]
Training parameters:
  Debug interval = -1, weights = 0.1, learning rate = 0.0001, momentum=0.9
...
Mean rms=2.758%, delta=16.418%, train=35.971%(49.658%), skip ratio=0%
Iteration 999: ALIGNED TRUTH : MySQL { & Advanced her BOX me Management your new have Post by
Iteration 999: BEST OCR TEXT : MySQL ( & Advanced her BON me Management your new have Post by
File /tmp/tmp.RQfY2cDM00/eng/eng.Arial.exp-2.lstmf page 31 :
Mean rms=2.757%, delta=16.403%, train=35.942%(49.624%), skip ratio=0%
2 Percent improvement time=49, best error was 39.615 @ 789
At iteration 838/1000/1000, Mean rms=2.757%, delta=16.403%, char train=35.942%, word train=49.624%, skip ratio=0%,  New best char error = 35.942 wrote best model:/home/s
hree/tesstutorial/nydlayer/nyd35.942_838.lstm wrote checkpoint.

Iteration 1000: ALIGNED TRUTH : Bailey Joshua, mate, 190 Eldridge
Iteration 1000: BEST OCR TEXT : Bailey Joshua, mate, 190 Eldridge
File /home/shree/tesstutorial/nyd/eng.1852nydir.exp0.lstmf page 26 (Perfect):

ERROR - even when using both train and eval with same set of files

 lstmtraining  \
>    -U ~/tesstutorial/nyd/eng.unicharset \
>   --train_listfile ~/tesstutorial/nyd/eng.training_files.txt \
>     --eval_listfile ~/tesstutorial/nyd/eng.training_files.txt \
>   --script_dir ../langdata   \
>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>   --continue_from ~/tesstutorial/nydlayer/eng.lstm \
>   --model_output ~/tesstutorial/nydlayer/nyd \
>   --debug_interval -1 \
>   --target_error_rate 0.01
Loaded file /home/shree/tesstutorial/nydlayer/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tesstutorial/nydlayer/eng.lstm
Other case É of é is not in unicharset
Appending a new network to an old one!!Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Num outputs,weights in serial:
  Lfx256:256, 394240
  Fc105:105, 26985
Total weights = 421225
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc105] from request [Lfx256 O1c105]
Training parameters:
  Debug interval = -1, weights = 0.1, learning rate = 0.0001, momentum=0.9
Loaded 29/29 pages (1-29) of document /home/shree/tesstutorial/nyd/eng.1852nydir.exp0.lstmf
Loaded 8/8 pages (1-8) of document /home/shree/tesstutorial/nyd/eng.1852nydir.exp1.lstmf
Loaded 12/12 pages (1-12) of document /home/shree/tesstutorial/nyd/eng.1852nydir.exp2.lstmf
Loaded 29/29 pages (1-29) of document /home/shree/tesstutorial/nyd/eng.1852nydir.exp-1.lstmf
Loaded 29/29 pages (1-29) of document /home/shree/tesstutorial/nyd/eng.1852nydir.exp0.lstmf
Loaded 104/104 pages (1-104) of document /home/shree/tesstutorial/nyd/eng.Arial.exp-2.lstmf
Loaded 29/29 pages (1-29) of document /home/shree/tesstutorial/nyd/eng.1852nydir.exp-1.lstmf

...
Iteration 298: ALIGNED TRUTH : Bailey Julia, 167% WWooster
Iteration 298: BEST OCR TEXT : Baiiey aiia, % ooter
File /home/shree/tesstutorial/nyd/eng.1852nydir.exp2.lstmf page 11 :
Mean rms=5.44%, delta=46.985%, train=96.919%(99.93%), skip ratio=0%
Iteration 299: ALIGNED TRUTH : AA Q WW E R T Y UU II O P L KK J HH GG F D S ZZ X C V B NN MMM 0 9 8 7 7 6 5 4 3
Iteration 299: BEST OCR TEXT :     B            o      B
File /tmp/tmp.RQfY2cDM00/eng/eng.Arial.exp-2.lstmf page 81 :
Mean rms=5.435%, delta=46.865%, train=96.775%(99.92%), skip ratio=0%
lstmtraining: ../ccutil/genericvector.h:697: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jun 2, 2017

Related #644

@Shreeshrii Shreeshrii reopened this Jul 15, 2017
@Shreeshrii
Copy link
Collaborator Author

@theraysmith

Latest code change has reverted the fix for this issue

Iteration 99: ALIGNED TRUTH : च छ ज झ ञ उ ऊ ट ठ ड ढ ण ऋ ॠ त थ
Iteration 99: BEST OCR TEXT :
File /tmp/tmp.m82dWGBYZW/mar/mar.Aparajita.exp0.lstmf page 3 :
Mean rms=5.427%, delta=53.011%, train=110.2%(100%), skip ratio=0%
lstmtraining: genericvector.h:713: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

@theraysmith
Copy link
Contributor

theraysmith commented Jul 16, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jul 16, 2017

Still getting the error, as of commit f4f66f8:

Iteration 99: ALIGNED TRUTH : पुनर्व्यवस्थित अमेरिकी इंडोनेशिया
Iteration 99: BEST OCR TEXT :
File /tmp/tmp.7Z9YtK1Bru/hin/hin.Siddhanta.exp0.lstmf page 519 :
Mean rms=5.531%, delta=52.419%, train=109.818%(100%), skip ratio=0%
lstmtraining: ../ccutil/genericvector.h:713: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

Version

shree@sanskrit:~/tesseract$ tesseract -v
tesseract f4f66f8
 leptonica-1.74.4
  libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : libwebp 0.4.0 : libopenjp2 2.1.2

Training command used

nice lstmtraining   -U ~/tesstutorial/hintest/hin.unicharset   --train_listfile ~/tesstutorial/hintest/hin.training_files.txt   --eval_listfile ~/tesstutorial/hineval/hin.training_files.txt   --continue_from ~/tesstutorial/hintest/hin.lstm   --model_output ~/tesstutorial/hintest/hinlayer   --script_dir ../langdata   --append_index 5   --net_spec '[Lfx384 O1c105]'   --target_error_rate 0.01   --perfect_sample_delay 19   --debug_interval -1

@Shreeshrii
Copy link
Collaborator Author

Error happens later with finetune command - same set of files

nice lstmtraining    --train_listfile ~/tesstutorial/hintest/hin.training_files.txt   --eval_listfile ~/tesstutorial/hineval/hin.training_files.txt   --continue_from ~/tesstutorial/hintest/hin.lstm   --model_output ~/tesstutorial/hintest/hintune   --target_error_rate 0.01   --perfect_sample_delay 19   --debug_interval -1

Error message - core dumped

Iteration 199: ALIGNED TRUTH : मुद्राओं घुसपैठ व्हिटफोर्ड इंटरनेट
Iteration 199: BEST OCR TEXT : मुद्राओं घुसपैठ व्हिटफोर्ड इंटरनेट
File /tmp/tmp.7Z9YtK1Bru/hin/hin.FreeSans.exp0.lstmf page 905 (Perfect):
Mean rms=0.276%, delta=0.118%, train=0.285%(1.056%), skip ratio=2%
lstmtraining: ../ccutil/genericvector.h:713: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

@theraysmith
Copy link
Contributor

theraysmith commented Jul 16, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jul 16, 2017

with finetune command

lstmtraining: ../ccutil/genericvector.h:713: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff628bc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007ffff628bc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff628f028 in __GI_abort () at abort.c:89
#2  0x00007ffff6284bf6 in __assert_fail_base (fmt=0x7ffff63d9018 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7ffff70f5be0 "index >= 0 && index < size_used_",
    file=file@entry=0x7ffff70f5668 "../ccutil/genericvector.h", line=line@entry=713,
    function=function@entry=0x7ffff7116d20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:92
#3  0x00007ffff6284ca2 in __GI___assert_fail (assertion=0x7ffff70f5be0 "index >= 0 && index < size_used_",
    file=0x7ffff70f5668 "../ccutil/genericvector.h", line=713,
    function=0x7ffff7116d20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
#4  0x00007ffff7071983 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0)
    at ../ccutil/genericvector.h:713
#5  0x00007ffff7074dd5 in operator[] (this=<optimized out>, this=<optimized out>, index=<optimized out>) at lstmtrainer.cpp:1335
#6  tesseract::LSTMTrainer::UpdateErrorGraph (this=this@entry=0x7fffffffd780, iteration=iteration@entry=11,
    error_rate=error_rate@entry=0.28499999999999998, model_data=..., tester=tester@entry=0x845ce0) at lstmtrainer.cpp:1272
#7  0x00007ffff70780e7 in tesseract::LSTMTrainer::MaintainCheckpoints (this=this@entry=0x7fffffffd780, tester=tester@entry=0x845ce0,
    log_msg=log_msg@entry=0x7fffffffd350) at lstmtrainer.cpp:338
#8  0x0000000000407712 in main (argc=1, argv=0x7fffffffe408) at lstmtraining.cpp:197

@Shreeshrii
Copy link
Collaborator Author

with replace top layer command

(gdb) backtrace
#0  0x00007ffff628bc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff628f028 in __GI_abort () at abort.c:89
#2  0x00007ffff6284bf6 in __assert_fail_base (fmt=0x7ffff63d9018 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7ffff70f5be0 "index >= 0 && index < size_used_",
    file=file@entry=0x7ffff70f5668 "../ccutil/genericvector.h", line=line@entry=713,
    function=function@entry=0x7ffff7116d20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:92
#3  0x00007ffff6284ca2 in __GI___assert_fail (assertion=0x7ffff70f5be0 "index >= 0 && index < size_used_",
    file=0x7ffff70f5668 "../ccutil/genericvector.h", line=713,
    function=0x7ffff7116d20 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
#4  0x00007ffff7071983 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0)
    at ../ccutil/genericvector.h:713
#5  0x00007ffff7074dd5 in operator[] (this=<optimized out>, this=<optimized out>, index=<optimized out>) at lstmtrainer.cpp:1335
#6  tesseract::LSTMTrainer::UpdateErrorGraph (this=this@entry=0x7fffffffd6d0, iteration=iteration@entry=100,
    error_rate=error_rate@entry=109.818, model_data=..., tester=tester@entry=0x845d00) at lstmtrainer.cpp:1272
#7  0x00007ffff70780e7 in tesseract::LSTMTrainer::MaintainCheckpoints (this=this@entry=0x7fffffffd6d0, tester=tester@entry=0x845d00,
    log_msg=log_msg@entry=0x7fffffffd2a0) at lstmtrainer.cpp:338
#8  0x0000000000407712 in main (argc=1, argv=0x7fffffffe398) at lstmtraining.cpp:197
(gdb)

@theraysmith
Copy link
Contributor

theraysmith commented Jul 17, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Yes, it did. Thank you, @stweil and @theraysmith .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants