Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of plural forms for translation #23797

Open
mikhirev opened this issue Mar 29, 2023 · 5 comments
Open

Incorrect handling of plural forms for translation #23797

mikhirev opened this issue Mar 29, 2023 · 5 comments

Comments

@mikhirev
Copy link

Description

Hi!

The current handling of strings with multiple plural forms by mapping them to key-value pairs in ini file is incorrect. It allows using only two plural forms like in English (singular for one only). There's also no problems that have only one plural form. But there are many languages with three and more plural forms (you may find a review here for example). Correct translation to such languages is currently impossible.

Please consider changing the translation framework to handle pluralization properly.

Gitea Version

1.19.0

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

n/a

Database

None

@delvh
Copy link
Member

delvh commented Mar 29, 2023

See also #19916.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Apr 5, 2023

What do you think about this approach #23933 ?

  • For en: There $[is, are] %d $[item, items]
  • For lv: There $[is-0, is-1, is-other] %d $[item-0, item-1, items]
  • For ar: There ... %d $[item-0, item-1, item-2, item-few, item-many, items]

(I don't understand lv or ar, so use English words for demo)

Because each language has standard defined Plural Forms ( https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml ) , so we can just put a array-like candidate word list in the string.

With this approach, translators just need to fill the words for these forms.

I think we need a crowdin-compatible and ini-compatible syntax, because Gitea is using these systems.

And we need a translator-friendly syntax, otherwise the strings could get broken frequently (I have found a lot of broken translation strings recently ... )


I haven't tries how Crowdin handles the pluralization work, whether it has other better approaches, or whether there is a better translation system.

@mikhirev
Copy link
Author

mikhirev commented Apr 5, 2023

@wxiaoguang, this approach will not work for different languages. E. g. in Russian we usually don't use the verb (is/are) in sentences like that. In other languages some additional words may be needed. The common practice is to allow translators deal with the whole string, not separate words, good examples are ngettext and gotext plurals.

Probably the simplest way is to implement the ICU message format support. It is supported by Crowdin and there is a Go module for parsing it.

@wxiaoguang
Copy link
Contributor

Thank you, then I think ICU message format is the answer.

@techknowlogick
Copy link
Member

@mikhirev thank you for those details and your research into libraries. ICU format looks good. We could probably force ICU format into ini, and normally I'd be against changing format of config files, but maybe this is an opportunity to look into getting away from ini (even if only for translations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants