-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Suggestion] Script for installing only selected languages from github tessdata_fast #1440
Comments
Agree. #1423 (comment) |
@Shreeshrii and please, I propose to have an explanation if one can delete the subdirectory /script below /tessdata. I have no idea (even after having read the wiki pages for this) for what these scripts are good and which files under tessdata can be deleted (for example, if one has only deu-fra-eng installed, what can be deleted from a fresh and full github checkout?). |
If you are only using deu-fra-eng, you maybe interested in trying out script/Latin which has these three plus other languages written in Latin alphabet (not language). For few files, you can try @stweil may offer a more elegant solution :-) |
It would be useful to have a bash utility that have these options:
|
Well, I could image many more variants.
|
@Wikinaut, |
@stweil Problem is, that this is "biased" developer knowledge. As a sometimes-users, I would like to have a short page where the "best use praxis" is explained. My seriously meant questions as follow-up to your answer above:
|
You can select it by running
Which language is ABC? It's neither a language, nor a meta-language or a language cluster. It's a script, Latin script in my example (German: Schrift). See the Wikipedia article for the full description.
You have to try it for your class of documents. You can combine |
@stweil Did not know this before (that combination is possible)
"script" should be renamed to "font" or "fonts". |
Alternately you can try deu+eng+fra vs script/Latin |
No, font is not the same as script. Arial, Times New Roman or Helvetica are different fonts, but all of them are typically used with Latin script. |
Okay. I am closing this now. Thanks. |
(this issue is probably something for #1423 )
When running tesseract from the sources it often appears to be overkill (in terms of download bandwidth, time, disk space) to install all languages from github tessdata_fast. I propose a small helper script, or a configuration file, so that only certain user-configurable languages are checked out from github.
@stweil @Shreeshrii What do you think?
The text was updated successfully, but these errors were encountered: