-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best Traineddata Feedback - Hindi #66
Comments
See https://shreeshrii.github.io/tess4eval-san/ for accuracy reports with Hindi and Bihari language samples - not segregated. The images used can be seen from I have NOT looked at wordlists yet because I was under the impression that they do not make much difference to accuracy for LSTM models. Is that correct, @theraysmith |
Some of the errors in recognition of Hindi are because of use of a different orthographic style for some of the letters. Please see https://shreeshrii.github.io/tess4eval-san/index-4-hinbest.html where the errors relate to Interestingly, these are recognized correctly in the original hin.traineddata for 4.00.00-alpha. These can be fixed by ensuring that fonts with different orthographies are used. @theraysmith If you provide a list of Devanagari fonts used for training, I can check for this. |
For wordlists/training_text for modern languages, I will also suggest using the localization lists from unicode.org Please see: see http://www.unicode.org/cldr/charts/31/summary/root.html |
Also see comments for #64 - feedback regarding Sanskrit |
See attached reports, run using https://github.com/eddieantonio/isri-ocr-evaluation-tools which supports utf-8 text. |
hin.lstm-unicharset does not have the following devanagari characters and combining marks:
ङ 1 0,255,0,255,0,0,0,0,0,0 Devanagari 129 0 129 ङ # ङ [919 ]x
ऍ | 2317 | ऍ | 090D | DEVANAGARI LETTER CANDRA E
ॅ 0 0,255,0,255,0,0,0,0,0,0 Devanagari 124 17 124 ॅ # ॅ [945 ]
ॐ 1 0,255,0,255,0,0,0,0,0,0 Devanagari 158 0 158 ॐ # ॐ [950 ]x
पङ्कज
गङ्गा
ऍण्ड
ऍक्ट
डू यू हैव अ पॅन
फॅरनहाइट
ॐ
ॐकार
The text was updated successfully, but these errors were encountered: