-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocaleFallbackPriority for Transliterator fallback #3972
base: main
Are you sure you want to change the base?
Conversation
("en-US", &["en-US", "en", "und-Latn"]), | ||
( | ||
"az-Arab-IR", | ||
&["az-Arab-IR", "az-Arab", "az-IR", "az", "und-Arab"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do we want und-Arab-IR
in this chain too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification says no (that chain is copy-pasted from there), but I don't know a good way to check what ICU4C is doing.
// 2. Remove the script if it is implied by the other subtags and the fallback priority is | ||
// not transliteration. | ||
if self.config.priority != FallbackPriority::Transliteration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought/Suggestion: Rather than skipping this step, it seems like we could instead maximize the script at this point. Then, for example, ru-RU
becomes ru-Cyrl-RU
and we run the fallback chain from that reference point. If you do this then you don't need the max_script
function; just change the body of this branch to add instead of remove the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ru-Cyrl-RU
is not part of ru-RU
's fallback chain, neither is ru-Cyrl
that would be the next step: https://unicode.org/reports/tr35/tr35-general.html#Inheritance
We need to store the maximized script outside the locale itself, because it's invalid to maximize after we've already changed the locale. It needs to be on the initial locale, otherwise e.g. az
would fallback to und-Latn
, which is the maximized script for az
but not for az-IR
, and the spec explicitly says az-IR
ends up at und-Arab
(ignoring supplemental data that ICU4C is also ignoring).
I pushed an alternative version that reuses the normalize
code block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the specification is inconsistent. It also says
For either the source or the target, the fallback starts from the maximized locale ID (using the likely-subtags data).
emphasis mine. What complicates it even more is that
Where there are multiple scripts, the maximized script is tried first, and then the other scripts associated with the language (from supplemental data).
I think we might need supplemental data for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, ICU4C does what I'm doing AFAICT, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I checked that supplemental data is not accessed and that the fallback starts from the initial locale, not the maximized one)
|
This adds simple language-ID-only fallback that will be used for transliterator constructor fallback. Fallback is only performed on language, script, and region, as the examples in UTS #35 show.