-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest 'deva' for Devanagari #41
Comments
Can add Deva.traineddata which is trained on training text for all these languages taken together. |
Related papers: A Segmentation-Free Approach for Printed Devanagari Script Recognition (2015) Can we build language-independent OCR using LSTM networks? More interesting papers about LSTM for OCR: |
List of unicode devanagari fonts that could be used for training, if not already being used tesseract-ocr/tesseract#561 (comment) Sample og glyphs in different fonts |
Similary. it would be nice to have a generic traineddata for multiple Latin script based langs, as described in the paper I mentioned above. Likewise, you could provide a generic Cyrillic traineddata. |
And maybe one based on the Arabic script. |
#41 (comment) |
I assume the same would be needed for Greek. Or would it be better to include Greek characters in the Latin training set? Several sciences (especially Physics and Mathematics) use single Greek characters in texts which are mostly written with Latin letters. |
|
This request was implemted by Ray: |
Thanks! |
With LSTM training the dictionary dawg files have become optional. In light of this, I want to suggest an additional traineddata file for Devanagari script, which can cater to all main languages written in it.
The reason for suggesting this is, when I tested OCR on a Marathi text, a lot of words with rakaara were not recognised correctly. However, same page OCRed with Sanskrit recognised them correctly, but some others were incorrect.
So, in addition to the multiple traineddata for various languages written in Devaन
The text was updated successfully, but these errors were encountered: