-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would it be possible to train a German model? #82
Comments
https://en.wikipedia.org/wiki/List_of_Latin-script_letters In fact, every model can ONLY recognize chars out of the predefined characters dictionary at the train time since recognize will just output a list of index for each character in the dictionary, so if you match PaddleSharp/src/Sdcb.PaddleOCR.Models.Online/LocalDictOnlineRecognizationModel.cs Line 38 in 167e760
and there's a v3 model that is trained with a PaddleSharp/src/Sdcb.PaddleOCR.Models.Online/LocalDictOnlineRecognizationModel.cs Line 194 in 167e760
If you want to use the oldest v2 model public static LocalDictOnlineRecognizationModel GermanV2 => new("german_mobile_v2.0_rec_infer", "german_dict.txt", new Uri("https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar"), ModelVersion.V2); but this shouldn't work due to all dictionaries copied from PaddleSharp/src/Sdcb.PaddleOCR.Models.Shared/Sdcb.PaddleOCR.Models.Shared.csproj Lines 29 to 44 in 167e760
PaddleSharp/src/Sdcb.PaddleOCR.Models.Online/LocalDictOnlineRecognizationModel.cs Line 31 in 167e760
|
Thank you very much! I've overlooked the latin_dict. The mentioned 4 chars are there, being also recognizable. So it's already working when selecting LocalFullModels.LatinV3 as model. I'm going to optimize it, thx! |
@gitapii Hi. I wanted to use german language as well. Did you have to finetune or it works out of the box? |
Hi,
I recently tested this repo as nuget package and it seems to be a very good Paddle OCR solution for .NET. Would it be also possible to train/finetune a German model (maybe locally) or use the inference model from 'PaddlePaddle/PaddleOCR#1048'?
It's quite similar to English, but you have 4 more characters (ä, ö, ü, ß). At the moment, the model recognizes them as (a, o, u) without the dots above. It would be great.
Kind regards,
The text was updated successfully, but these errors were encountered: