Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM: Training: Deserialize Failed #792

Closed
Shreeshrii opened this issue Mar 27, 2017 · 18 comments
Closed

LSTM: Training: Deserialize Failed #792

Shreeshrii opened this issue Mar 27, 2017 · 18 comments
Labels

Comments

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Mar 27, 2017

I added -eval_listfile /home/shree/tesstutorial/hineval/tmp.txt \ to my lstmtraining command once it had come down to less than 3%char error.

While training is continuing, I am getting messages saying 'Deserialize Failed',

@Shreeshrii
Copy link
Collaborator Author

Lines with error:

At iteration 23765/35800/35801, 
Mean rms=0.111%, delta=0.79%, char train=2.554%, word train=14.083%, skip ratio=0%,  
wrote checkpoint.

2 Percent improvement time=15025, best error was 4.481 @ 8791
Warning: LSTMTrainer deserialized an LSTMRecognizer!
At iteration 23816/35900/35901, 
Mean rms=0.11%, delta=0.768%, char train=2.454%, word train=13.982%, skip ratio=0%,  
New best char error = 2.454 
Deserialize failed 
wrote best model:/home/shree/tesstutorial/hinlayer_from_hin/hinlayer2.454_23816.lstm 
wrote checkpoint.

...

Loaded 61/61 pages (1-61) of document /home/shree/tesstutorial/hineval/hin.Sahitya.exp0.lstmf
2 Percent improvement time=14846, 
best error was 4.436 @ 9020
At iteration 23866/36000/36001, 
Mean rms=0.11%, delta=0.749%, char train=2.396%, word train=14.124%, skip ratio=0%,  
New best char error = 2.396
Previous test incomplete, skipping test at iteration23816 
wrote checkpoint.

...

At iteration 23914/36100/36101, 
Mean rms=0.11%, delta=0.747%, char train=2.41%, word train=14.399%, skip ratio=0%,  
New worst char error = 2.41
At iteration 23618, stage 1, 
Eval Char error rate=1.8365727, Word error rate=7.0471499 wrote checkpoint.

...


At iteration 24320/36900/36901, 
Mean rms=0.109%, delta=0.745%, char train=2.454%, word train=14.204%, skip ratio=0%,  
New worst char error = 2.454
Deserialize failed wrote checkpoint.

...

2 Percent improvement time=15369, 
best error was 4.329 @ 9223
At iteration 24592/37400/37401, 
Mean rms=0.108%, delta=0.737%, char train=2.298%, word train=13.478%, skip ratio=0%,  
New best char error = 2.298 
wrote best model:/home/shree/tesstutorial/hinlayer_from_hin/hinlayer2.298_24592.lstm wrote checkpoint.

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 7, 2017

This is still happening with the latest code - the eval file was built using a different training text.

git log -1
commit d18931e
Date: Fri May 5 16:42:44 2017 -0700
Fixed int types for imported tf networks

  lstmtraining \
>   -U ~/tesstutorial/bihnew/bih.unicharset \
>   --train_listfile ~/tesstutorial/bihnew/bih.training_files.txt \
>   --eval_listfile ~/tesstutorial/bihtest/bih.training_files.txt \
>   --continue_from ~/tesstutorial/bihnewlayer/bih.lstm \
>   --model_output ~/tesstutorial/bihnewlayer/bihlayer \
>   --script_dir ../langdata \
>  --append_index 5 \
>  --net_spec '[Lfx384 O1c105]' \
>  --target_error_rate 0.01 \
>  --debug_interval -1
Loaded file /home/shree/tesstutorial/bihnewlayer/bih.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tesstutorial/bihnewlayer/bih.lstm
Appending a new network to an old one!!Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Setting properties for script Devanagari
Setting properties for script Han
Warning: given outputs 105 not equal to unicharset of 145.
Num outputs,weights in serial:
  Lfx384:384, 787968
  Fc145:145, 55825
Total weights = 843793
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx384Fc145] from request [Lfx384 O1c105]
Training parameters:
  Debug interval = -1, weights = 0.1, learning rate = 0.0001, momentum=0.9
Loaded 404/404 pages (1-404) of document /home/shree/tesstutorial/bihnew/bih.AA_NAGARI_SHREE_L3.exp0.lstmf
Loaded 199/199 pages (1-199) of document /home/shree/tesstutorial/bihtest/bih.Chandas.exp0.lstmf
Loaded 404/404 pages (1-404) of document /home/shree/tesstutorial/bihnew/bih.Adobe_Devanagari.exp0.lstmf
Loaded 404/404 pages (1-404) of document /home/shree/tesstutorial/bihnew/bih.CDAC-GISTSurekh.exp0.lstmf
Loaded 404/404 pages (1-404) of document /home/shree/tesstutorial/bihnew/bih.Aksharyogini2.exp0.lstmf
Loaded 411/411 pages (1-411) of document /home/shree/tesstutorial/bihnew/bih.CDAC-GISTYogesh.exp0.lstmf
Loaded 405/405 pages (1-405) of document /home/shree/tesstutorial/bihnew/bih.Annapurna_SIL.exp0.lstmf
Loaded 404/404 pages (1-404) of document /home/shree/tesstutorial/bihnew/bih.Arial_Unicode_MS.exp0.lstmf
Loaded 404/404 pages (1-404) of document /home/shree/tesstutorial/bihnew/bih.Aparajita.exp0.lstmf
Loaded 472/472 pages (1-472) of document /home/shree/tesstutorial/bihnew/bih.Chandas.exp0.lstmf
Loaded 202/202 pages (1-202) of document /home/shree/tesstutorial/bihtest/bih.FreeSerif.exp0.lstmf

At iteration 100/100/100, Mean rms=5.4%, delta=52.955%, char train=110.663%, word train=100%, skip ratio=0%,  New worst char error = 110.663 wrote checkpoint.
At iteration 200/200/200, Mean rms=5.254%, delta=50.469%, char train=105.319%, word train=100%, skip ratio=0%,  New worst char error = 105.319 wrote checkpoint.
At iteration 300/300/300, Mean rms=5.209%, delta=49.853%, char train=103.546%, word train=100%, skip ratio=0%,  New worst char error = 103.546 wrote checkpoint.
At iteration 400/400/400, Mean rms=5.189%, delta=49.729%, char train=102.636%, word train=100%, skip ratio=0%,  New worst char error = 102.636 wrote checkpoint.
At iteration 500/500/500, Mean rms=5.147%, delta=48.821%, char train=102.064%, word train=100%, skip ratio=0%,  New worst char error = 102.064 wrote checkpoint.
At iteration 600/600/600, Mean rms=5.127%, delta=48.496%, char train=101.601%, word train=99.991%, skip ratio=0%,  New worst char error = 101.601 wrote checkpoint.
At iteration 700/700/700, Mean rms=5.107%, delta=48.218%, char train=101.236%, word train=99.992%, skip ratio=0%,  New worst char error = 101.236 wrote checkpoint.
At iteration 800/800/800, Mean rms=5.081%, delta=47.738%, char train=100.547%, word train=99.986%, skip ratio=0%,  New worst char error = 100.547 wrote checkpoint.
At iteration 899/900/900, Mean rms=5.06%, delta=47.379%, char train=100.071%, word train=99.983%, skip ratio=0%,  New worst char error = 100.071 wrote checkpoint.
At iteration 999/1000/1000, Mean rms=5.038%, delta=46.991%, char train=99.425%, word train=99.985%, skip ratio=0%,  New best char error = 99.425 wrote checkpoint.
At iteration 1098/1100/1100, Mean rms=4.971%, delta=45.696%, char train=97.6%, word train=99.979%, skip ratio=0%,  New best char error = 97.6Deserialize failed wrote che
At iteration 1197/1200/1200, Mean rms=4.932%, delta=44.848%, char train=96.57%, word train=99.979%, skip ratio=0%,  New best char error = 96.57Deserialize failed wrote c
At iteration 1296/1300/1300, Mean rms=4.7%, delta=40.982%, char train=90.782%, word train=98.025%, skip ratio=0%,  New best char error = 90.782Deserialize failed wrote c
At iteration 1392/1400/1400, Mean rms=4.388%, delta=36.654%, char train=83.142%, word train=93.749%, skip ratio=0%,  New best char error = 83.142Deserialize failed wrote
At iteration 1491/1500/1500, Mean rms=4.053%, delta=32.455%, char train=74.886%, word train=88.3%, skip ratio=0%,  New best char error = 74.886Deserialize failed wrote c
At iteration 1582/1600/1600, Mean rms=3.71%, delta=28.11%, char train=66.546%, word train=82.7%, skip ratio=0%,  New best char error = 66.546Deserialize failed wrote bes
At iteration 1678/1700/1700, Mean rms=3.366%, delta=23.795%, char train=58.181%, word train=76.531%, skip ratio=0%,  New best char error = 58.181Deserialize failed wrote
At iteration 1772/1800/1800, Mean rms=3.018%, delta=19.642%, char train=49.915%, word train=70.119%, skip ratio=0%,  New best char error = 49.915Deserialize failed wrote
At iteration 1862/1900/1900, Mean rms=2.67%, delta=15.494%, char train=41.584%, word train=63.451%, skip ratio=0%,  New best char error = 41.584Deserialize failed wrote
At iteration 1955/2000/2000, Mean rms=2.321%, delta=11.413%, char train=33.335%, word train=56.515%, skip ratio=0%,  New best char error = 33.335Deserialize failed wrote
At iteration 2049/2100/2101, Mean rms=1.978%, delta=7.632%, char train=25.213%, word train=49.514%, skip ratio=0.1%,  New best char error = 25.213Deserialize failed wrot
At iteration 2137/2200/2201, Mean rms=1.631%, delta=3.903%, char train=17.289%, word train=42.271%, skip ratio=0.1%,  New best char error = 17.289Deserialize failed wrot
At iteration 2230/2300/2301, Mean rms=1.474%, delta=3.148%, char train=14.004%, word train=36.724%, skip ratio=0.1%,  New best char error = 14.004Deserialize failed wrot

@Shreeshrii
Copy link
Collaborator Author

@stweil Is it related to #881 (comment) ?

@stweil
Copy link
Contributor

stweil commented May 7, 2017

Maybe, I don't know. First I have to reproduce this.

@Shreeshrii
Copy link
Collaborator Author

@stweil In case you want to reproduce using the files I was using - they are for Bihari/Hindi language, devanagari script.

http://sanskritdocuments.org/hindi/bihtest.zip
http://sanskritdocuments.org/hindi/bihnew.zip

Zip file was too large to upload here or on my github account.

@stweil
Copy link
Contributor

stweil commented May 7, 2017

~/tesstutorial/bihnewlayer is needed, too.

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 7, 2017

mkdir -p ~/tesstutorial/bihnewlayer

combine_tessdata -e ../tessdata/hin.traineddata \
   ~/tesstutorial/bihnewlayer/bih.lstm

I was using the Hindi traineddata from the tessdata repo as the basis for training.

@Shreeshrii
Copy link
Collaborator Author

~/tesstutorial/bihnewlayer is needed, too.

http://sanskritdocuments.org/hindi/bihnewlayer.zip

it has:

  • bih.lstm (lstm model extracted from hin.traineddata)
  • bihlayer_checkpoint (current status of the model)
  • bihlayer4.005_8369.lstm (latest 'best' model)

I have not included all other intermediate .lstm files, since each is 33+ MB.

@stweil
Copy link
Contributor

stweil commented May 8, 2017

I had to fix the path in ~/tesstutorial/bihtest/bih.training_files.txt, but now lstmtraining works, and there seem to be no errors. Tested with debug version based on latest git master.

@Shreeshrii
Copy link
Collaborator Author

Thanks for checking, @stweil. I will rebuild with the latest git master and test again as I am still getting Deserialize Failed messages, though not for every checkpoint iteration of training.

Maybe, these are also just 'info' messages!

At iteration 8306/10800/10808, Mean rms=0.765%, delta=1.311%, char train=4.216%, word train=10.221%, skip ratio=0.2%, New best char error = 4.216 wrote best model:/ho
At iteration 8369/10900/10908, Mean rms=0.753%, delta=1.217%, char train=4.005%, word train=10.036%, skip ratio=0.2%, New best char error = 4.005Deserialize failed wr
At iteration 8426/11000/11008, Mean rms=0.752%, delta=1.211%, char train=4.014%, word train=9.962%, skip ratio=0.2%, New worst char error = 4.014Deserialize failed w
r
At iteration 8482/11100/11109, Mean rms=0.768%, delta=1.319%, char train=4.281%, word train=10.389%, skip ratio=0.3%, New worst char error = 4.281 wrote checkpoint.
At iteration 8548/11200/11210, Mean rms=0.773%, delta=1.355%, char train=4.347%, word train=10.64%, skip ratio=0.3%, New worst char error = 4.347 wrote checkpoint.
At iteration 8607/11300/11310, Mean rms=0.772%, delta=1.37%, char train=4.4%, word train=10.701%, skip ratio=0.3%, New worst char error = 4.4 wrote checkpoint.
At iteration 8666/11400/11410, Mean rms=0.777%, delta=1.359%, char train=4.394%, word train=10.81%, skip ratio=0.3%, New worst char error = 4.394 wrote checkpoint.
At iteration 8729/11500/11510, Mean rms=0.772%, delta=1.368%, char train=4.385%, word train=10.801%, skip ratio=0.2%, New worst char error = 4.385 wrote checkpoint.
At iteration 8794/11600/11610, Mean rms=0.772%, delta=1.343%, char train=4.354%, word train=10.848%, skip ratio=0.2%, New worst char error = 4.354 wrote checkpoint.
At iteration 8857/11700/11710, Mean rms=0.786%, delta=1.415%, char train=4.538%, word train=11.094%, skip ratio=0.2%, New worst char error = 4.538 wrote checkpoint.
At iteration 8919/11800/11810, Mean rms=0.802%, delta=1.531%, char train=4.968%, word train=11.586%, skip ratio=0.2%, New worst char error = 4.968 wrote checkpoint.
At iteration 8979/11900/11910, Mean rms=0.8%, delta=1.548%, char train=5.043%, word train=11.359%, skip ratio=0.2%, New worst char error = 5.043 wrote checkpoint.
At iteration 9034/12000/12010, Mean rms=0.79%, delta=1.526%, char train=4.924%, word train=11.199%, skip ratio=0.2%, New worst char error = 4.924 wrote checkpoint.
At iteration 9101/12100/12110, Mean rms=0.781%, delta=1.449%, char train=4.766%, word train=11.043%, skip ratio=0.1%, New worst char error = 4.766 wrote checkpoint.
At iteration 9160/12200/12210, Mean rms=0.784%, delta=1.443%, char train=4.801%, word train=11.025%, skip ratio=0%, New worst char error = 4.801 wrote checkpoint.
At iteration 9224/12300/12310, Mean rms=0.784%, delta=1.444%, char train=4.728%, word train=10.783%, skip ratio=0%, New worst char error = 4.728 wrote checkpoint.
At iteration 9285/12400/12410, Mean rms=0.782%, delta=1.477%, char train=4.773%, word train=10.693%, skip ratio=0%, New worst char error = 4.773 wrote checkpoint.
At iteration 9349/12500/12511, Mean rms=0.776%, delta=1.419%, char train=4.545%, word train=10.359%, skip ratio=0.1%, New worst char error = 4.545 wrote checkpoint.
At iteration 9405/12600/12611, Mean rms=0.769%, delta=1.421%, char train=4.498%, word train=10.172%, skip ratio=0.1%, New worst char error = 4.498 wrote checkpoint.
At iteration 9461/12700/12711, Mean rms=0.749%, delta=1.324%, char train=4.276%, word train=9.825%, skip ratio=0.1%, wrote checkpoint.
At iteration 9518/12800/12811, Mean rms=0.737%, delta=1.229%, char train=3.922%, word train=9.583%, skip ratio=0.1%, New best char error = 3.922Previous test incomple
At iteration 9576/12900/12911, Mean rms=0.733%, delta=1.198%, char train=3.746%, word train=9.473%, skip ratio=0.1%, New best char error = 3.746Previous test incomple
At iteration 9641/13000/13011, Mean rms=0.743%, delta=1.261%, char train=4.006%, word train=9.773%, skip ratio=0.1%, New worst char error = 4.006Previous test incompl
At iteration 9708/13100/13111, Mean rms=0.742%, delta=1.255%, char train=3.899%, word train=9.567%, skip ratio=0.1%, New worst char error = 3.899Previous test incompl
At iteration 9766/13200/13211, Mean rms=0.727%, delta=1.193%, char train=3.708%, word train=9.23%, skip ratio=0.1%, New best char error = 3.708Previous test incomplet
At iteration 9822/13300/13311, Mean rms=0.727%, delta=1.191%, char train=3.717%, word train=9.477%, skip ratio=0.1%, New worst char error = 3.717Previous test incompl

At iteration 9880/13400/13411, Mean rms=0.723%, delta=1.151%, char train=3.603%, word train=9.355%, skip ratio=0.1%, New best char error = 3.603At iteration 8369, sta
ge 1, Eval Char error rate=2.0568204, Word error rate=4.3746914 wrote best model:/home/shree/tesstutorial/bihnewlayer/bihlayer3.603_9880.lstm wrote checkpoint.
At iteration 9934/13500/13511, Mean rms=0.725%, delta=1.203%, char train=3.786%, word train=9.603%, skip ratio=0%, New worst char error = 3.786Deserialize failed wrot
At iteration 9988/13600/13611, Mean rms=0.733%, delta=1.237%, char train=4.018%, word train=9.875%, skip ratio=0%, New worst char error = 4.018 wrote checkpoint.
At iteration 10036/13700/13711, Mean rms=0.737%, delta=1.256%, char train=4.052%, word train=10.005%, skip ratio=0%, New worst char error = 4.052 wrote checkpoint.

@Shreeshrii
Copy link
Collaborator Author

I had to fix the path in ~/tesstutorial/bihtest/bih.training_files.txt,

Did you need to change path in both to match your setup?

--train_listfile ~/tesstutorial/bihnew/bih.training_files.txt
--eval_listfile ~/tesstutorial/bihtest/bih.training_files.txt \

or did you change path in bihtest to match bihnew?

I think the problem occurs when the training files and evaluation files are different. The lstmf files in bihnew and bihtest were created using different training texts and font combos.

@stweil
Copy link
Contributor

stweil commented May 8, 2017

Yes, I changed both files to match my home directory.

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 9, 2017

Thanks, @stweil . However, I can now reproduce the error that you got when using lstmf files from a different version.

@theraysmith I am getting these errors when I use lstmf files created before the commits regarding endianness. However, the location of error is different from the rest reported above in this thread.

 lstmtraining  \
>    -U ~/tesstutorial/nyd/eng.unicharset \
>   --train_listfile ~/tesstutorial/nyd/nyd.training_files.txt \
>   --script_dir ../langdata   \
>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>   --continue_from ~/tesstutorial/nydlayer/eng.lstm \
>   --model_output ~/tesstutorial/nydlayer/nyd \
>   --debug_interval -1 \
>   --target_error_rate 0.01
Loaded file /home/shree/tesstutorial/nydlayer/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tesstutorial/nydlayer/eng.lstm
Appending a new network to an old one!!Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Warning: given outputs 105 not equal to unicharset of 75.
Num outputs,weights in serial:
  Lfx256:256, 394240
  Fc75:75, 19275
Total weights = 413515
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc75] from request [Lfx256 O1c105]
Training parameters:
  Debug interval = -1, weights = 0.1, learning rate = 0.0001, momentum=0.9

Deserialize failed: /home/shree/tesstutorial/nyd/eng.1852nydir.exp1.lstmf read 0/8 pages

@Shreeshrii
Copy link
Collaborator Author

@stweil @theraysmith

I tried running using the latest code with enable-debug option - earlier the error was 'deserialize failed' - please see #792 (comment)

With --enable-debug, I get core dumped (same as #561)

Iteration 13398: ALIGNED TRUTH : पंचाग कला की, फ़ अः ग़ुमान आलोचना छूटती के ज़् द्वा अधीन र्द् देहियाँ भजनला
Iteration 13398: BEST OCR TEXT : पंचाग कला की, फ़ अः गुमान आलोचना छूटती के ज़् द्वा अधीन र्द् देहियाँ भजनला
File /tmp/tmp.3zBjAvGc9O/bih/bih.Lohit_Devanagari.exp0.lstmf page 24 :
Mean rms=0.723%, delta=1.151%, train=3.605%(9.362%), skip ratio=0.1%
Iteration 13399: ALIGNED TRUTH : संभावना :
Iteration 13399: BEST OCR TEXT : संभावना :
File /tmp/tmp.3zBjAvGc9O/bih/bih.Mangal.exp0.lstmf page 458 (Perfect):
Mean rms=0.723%, delta=1.151%, train=3.603%(9.355%), skip ratio=0.1%
lstmtraining: ../ccutil/genericvector.h:697: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

gdb output pasted below:

Iteration 13399: ALIGNED TRUTH : संभावना :
Iteration 13399: BEST OCR TEXT : संभावना :
File /tmp/tmp.3zBjAvGc9O/bih/bih.Mangal.exp0.lstmf page 458 (Perfect):
Mean rms=0.723%, delta=1.151%, train=3.603%(9.355%), skip ratio=0.1%
[Thread 0x7fe0cd3e0700 (LWP 135) exited]
[New Thread 0x7fe0cd3e0700 (LWP 136)]
lstmtraining: ../ccutil/genericvector.h:697: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fe0cd3e0700 (LWP 136)]
0x00007fe0d2886c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007fe0d2886c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fe0d288a028 in __GI_abort () at abort.c:89
#2  0x00007fe0d287fbf6 in __assert_fail_base (fmt=0x7fe0d29d03b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7fe0d370d6c0 "index >= 0 && index < size_used_", file=file@entry=0x7fe0d370d148 "../ccutil/genericvector.h", line=line@entry=697,
    function=function@entry=0x7fe0d372ea60 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92
#3  0x00007fe0d287fca2 in __GI___assert_fail (assertion=0x7fe0d370d6c0 "index >= 0 && index < size_used_", file=0x7fe0d370d148 "../ccutil/genericvector.h", line=697,
    function=0x7fe0d372ea60 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
#4  0x00007fe0d368c803 in GenericVector<char>::operator[] (this=<optimized out>, this=<optimized out>, index=0) at ../ccutil/genericvector.h:697
#5  0x00007fe0d368cfb8 in operator[] (this=0x7fffcf2d0bc0, this=0x7fffcf2d0bc0, index=0) at lstmtrainer.cpp:920
#6  tesseract::LSTMTrainer::ReadTrainingDump (this=this@entry=0x7fe0cd3df640, data=..., trainer=trainer@entry=0x7fe0cd3df640) at lstmtrainer.cpp:921
#7  0x000000000040b03e in tesseract::LSTMTester::RunEvalSync (this=this@entry=0x7fffcf2d0b50, iteration=9766, training_errors=<optimized out>, model_data=...,
    training_stage=1) at lstmtester.cpp:86
#8  0x000000000040b539 in tesseract::LSTMTester::ThreadFunc (lstmtester_void=0x7fffcf2d0b50) at lstmtester.cpp:123
#9  0x00007fe0d0748184 in start_thread (arg=0x7fe0cd3e0700) at pthread_create.c:312
#10 0x00007fe0d294a37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) up
#1  0x00007fe0d288a028 in __GI_abort () at abort.c:89
89      abort.c: No such file or directory.
(gdb) up
#2  0x00007fe0d287fbf6 in __assert_fail_base (fmt=0x7fe0d29d03b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7fe0d370d6c0 "index >= 0 && index < size_used_", file=file@entry=0x7fe0d370d148 "../ccutil/genericvector.h", line=line@entry=697,
    function=function@entry=0x7fe0d372ea60 <_ZZNK13GenericVectorIcEixEiE19__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92

...

(gdb) up
#7  0x000000000040b03e in tesseract::LSTMTester::RunEvalSync (this=this@entry=0x7fffcf2d0b50, iteration=9766, training_errors=<optimized out>, model_data=...,
    training_stage=1) at lstmtester.cpp:86
86        if (!trainer.ReadTrainingDump(model_data, &trainer)) {
(gdb) print model_data
$10 = (const GenericVector<char> &) @0x7fffcf2d0bc0: {static kDefaultVectorSize = <optimized out>, size_used_ = 0, size_reserved_ = 0, data_ = 0x0, clear_cb_ = 0x0,
  compare_cb_ = 0x0}

@stweil was not able to reproduce this - #792 (comment)

The only difference I can see would be that I run the program under WSL (bash on windows 10).

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jun 2, 2017

Duplicate/Related #644

stweil added a commit to stweil/tesseract that referenced this issue Jun 4, 2017
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion
(see issues tesseract-ocr#644, tesseract-ocr#792).

The new test in LSTMTrainer::ReadTrainingDump was added to improve
the robustness of the code.

Signed-off-by: Stefan Weil <[email protected]>
stweil added a commit to stweil/tesseract that referenced this issue Jun 4, 2017
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion
(see issues tesseract-ocr#644, tesseract-ocr#792).

The new test in LSTMTrainer::ReadTrainingDump was added to improve
the robustness of the code.

Signed-off-by: Stefan Weil <[email protected]>
@Shreeshrii Shreeshrii reopened this Jul 11, 2017
@Shreeshrii
Copy link
Collaborator Author

Still getting the error:

At iteration 800/800/802, Mean rms=0.934%, delta=47.138%, char train=100.525%, word train=100%, skip ratio=0.25%,  New worst char error = 100.525 wrote checkpoint.
Compute CTC targets failed!
At iteration 900/900/903, Mean rms=0.932%, delta=46.952%, char train=100.461%, word train=99.972%, skip ratio=0.333%,  New worst char error = 100.461 wrote checkpoint.
Warning: data size is zero in LSTMTrainer::ReadTrainingDump
At iteration 1000/1000/1003, Mean rms=0.93%, delta=46.74%, char train=100.415%, word train=99.975%, skip ratio=0.3%,  New worst char error = 100.415 wrote checkpoint.
Warning: LSTMTrainer deserialized an LSTMRecognizer!
2 Percent improvement time=1098, best error was 100 @ 0
Compute CTC targets failed!
Compute CTC targets failed!
At iteration 1098/1100/1103, Mean rms=0.917%, delta=45.445%, char train=99.984%, word train=99.975%, skip ratio=0.3%,  New best char error = 99.984

Deserialize failed wrote checkpoint.

ref: https://travis-ci.org/Shreeshrii/tess4train/builds/252343478

@amitdo
Copy link
Collaborator

amitdo commented Sep 6, 2017

Is this issue still present with the latest code?

@Shreeshrii
Copy link
Collaborator Author

Closing Issue since LSTM training process has changed and so it is difficult to duplicate the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants