Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

program shutdown in tesseract step due to language files #45

Closed
Bacchushlg opened this issue Jan 22, 2018 · 8 comments
Closed

program shutdown in tesseract step due to language files #45

Bacchushlg opened this issue Jan 22, 2018 · 8 comments

Comments

@Bacchushlg
Copy link
Collaborator

The tesseract step outputs errors:

Failed loading language 'fra'
Failed loading language 'eng'
read_params_file: parameter not found:

After this Audiveris shuts down with no additional errors.

I have tried several combinations: put the language file to a certain directory and set TESSDATA_PREFIX to this directory or create a directory "C:\Program Files (x86)\tesseract-ocr\tessdata" and put the language files there. I get the error in both cases.
The language files date from 15.01.2018.

@hbitteur
Copy link
Contributor

These error messages don't originate from Audiveris java code, so we can assume they come from Tesseract C++ binary code.

IIRC there is a caveat with TESSDATA_PREFIX. It is not meant to point to the language file directory but rather to the directory which contains the tessdata directory which in turns contains your language files. Or something like that :-)

For example, see this issue tesseract-ocr/tesseract#221

@maximumspatium
Copy link
Contributor

@Bacchushlg
Some more info would be helpful. Which Tesseract version is shown at Audiveris' startup?
Which Java/OS are you running?

@Bacchushlg
Copy link
Collaborator Author

  • Audiveris: 5.0.0:743f229a9
  • OS: Windows 10 10.0
  • Architecture: amd64
  • Java VM: Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)
  • OCR Engine: Tesseract OCR, version 3.04.01

Just one more question: I have just the 4 language files in the tessdata directory: deu, eng, fra and ita.traineddata
Should there be more?

@maximumspatium
Copy link
Contributor

@Bacchushlg Thanks!

Please provide me with a detailed description on how did you install Tesseract language files (where are they located, where they came from etc.)

I have just the 4 language files in the tessdata directory: deu, eng, fra and ita.traineddata
Should there be more?

It depends on your targets. In the minimal configuration, only eng.traineddata + config files are required. For every further language in your scores, you'll need to add an appropriate OCR language.

@Bacchushlg
Copy link
Collaborator Author

I found the reason for my problem: I had downloaded the actual .traineddata-files, which only work with tesseract version 4, while audiveris uses version 3.
I downloaded the correct ones now and it works fine.
Just one more question: is there some documentation about the GUI of audiveris? I don't understand some features, esp. I don't understand how to train elements (e.g. make audiveris understand, that a certain "3" belongs to a triole).

@maximumspatium
Copy link
Contributor

maximumspatium commented Jan 22, 2018

I downloaded the correct ones now and it works fine.

I'm glad you solved your problem!

is there some documentation about the GUI of audiveris?

Currently no, but we'll add one very soon because v5.1 is about to be released.

I don't understand how to train elements (e.g. make audiveris understand, that a certain "3" belongs to a triple).

Left click on your "3" in the score, go to "Shape" palette in the panel to the the right, click on the pedal mark (𝆮) followed by a double-click on the TUPLET_THREE symbol. With a bunch of luck, your symbol will be converted to the desired tuple...

@maximumspatium
Copy link
Contributor

I'll close this issue because the original problem has been solved.

@hbitteur
Copy link
Contributor

hbitteur commented Jan 22, 2018

@Bacchushlg
Writing doc on the UI provided by coming 5.1 release stands high on our todo list, right after fixing some hot issues like popup menu on MacOS (see #2), or lyric lines (see #44), etc. (8 issues as of today).
I'm closing yours since Tesseract is now OK for you. So, that's 7 issues left.

Doc should be available very shortly. Keep in mind however, that the training of classifiers is a bit more complex than plain end-user actions, but we'll address it as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants