Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yiddish transliteration does not conform to YIVO transliteration #5

Open
j0ma opened this issue Aug 23, 2022 · 0 comments
Open

Yiddish transliteration does not conform to YIVO transliteration #5

j0ma opened this issue Aug 23, 2022 · 0 comments

Comments

@j0ma
Copy link

j0ma commented Aug 23, 2022

Thanks for a great library! While using it for Yiddish, I noticed that some of the transliterations do not conform to the YIVO romanization standard.

To pinpoint what kind of errors uroman is making, I conducted a romanization experiment using the data from Saleva (2020) and another library for Yiddish romanization called yiddish.

Here are some benchmark numbers using accuracy and mean F1 score as defined in Proceedings of the Seventh Named Entities Workshop:

library mean_f1 accuracy
uroman 0.937 0.458
yiddish 0.990 0.936

The diffs for what uroman gets wrong can be found here. Many seem to be i/y mismatches as well as Hebrew expansion errors:

-aforizm
	+aforyzm
-aparatshik
-apteyk
-apteyker
-apikoyres
	+aparatshyk
	+aptyyk
	+aptyyker
	+apykurs
-apetit
	+apetyt

Would it be possible to implement the -l yid flag such that the output conforms to the YIVO romanization standard?
As far as I'm aware, it's by far the most used romanization format for Yiddish.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant