Yiddish transliteration does not conform to YIVO transliteration #5

j0ma · 2022-08-23T22:09:11Z

Thanks for a great library! While using it for Yiddish, I noticed that some of the transliterations do not conform to the YIVO romanization standard.

To pinpoint what kind of errors uroman is making, I conducted a romanization experiment using the data from Saleva (2020) and another library for Yiddish romanization called yiddish.

Here are some benchmark numbers using accuracy and mean F1 score as defined in Proceedings of the Seventh Named Entities Workshop:

library	mean_f1	accuracy
uroman	0.937	0.458
yiddish	0.990	0.936

The diffs for what uroman gets wrong can be found here. Many seem to be i/y mismatches as well as Hebrew expansion errors:

-aforizm
	+aforyzm
-aparatshik
-apteyk
-apteyker
-apikoyres
	+aparatshyk
	+aptyyk
	+aptyyker
	+apykurs
-apetit
	+apetyt

Would it be possible to implement the -l yid flag such that the output conforms to the YIVO romanization standard?
As far as I'm aware, it's by far the most used romanization format for Yiddish.

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yiddish transliteration does not conform to YIVO transliteration #5

Yiddish transliteration does not conform to YIVO transliteration #5

j0ma commented Aug 23, 2022

Yiddish transliteration does not conform to YIVO transliteration #5

Yiddish transliteration does not conform to YIVO transliteration #5

Comments

j0ma commented Aug 23, 2022