diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index e61d5ba6caa..aa1c10898da 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -728,25 +728,38 @@ the Visual variants are distinct in appearance, but otherwise encompass a single and the Subsets exclude certain characters from a script. The Equivalents for Subsets are not as well defined, so the "Equivalents" are marked as approximate. -| Group | Script | Equivalent | +| Variant | Script | Equivalent | | --- | --- | --- | -| Compounds | Jpan | ≡ Hani ∪ Hira ∪ Kana | +| Compound | Jpan | ≡ Hani ∪ Hira ∪ Kana | | | Hrkt | ≡ Hira ∪ Kana | | | Kore | ≡ Hani ∪ Hang | | | Hanb | ≡ Hani ∪ Bopo | -| Visual variants | Aran | ≡ Arab (Nastaliq variant) | +| Visual | Aran | ≡ Arab (Nastaliq variant) | | | Cyrs | ≡ Cyrl (Old Church Slavonic variant) | | | Latf | ≡ Latn (Fraktur variant) | | | Latg | ≡ Latn (Gaelic variant) | | | Syrn | ≡ Syrc (Eastern variant) | | | Syre | ≡ Syrc (Estrangelo variant) | | | Syrj | ≡ Syrc (Western variant) | -| Subsets (approximate) | Jamo | ≃ Hang - LVT - LV | -| | Hans | ≃ Hani - Traditional-only | -| | Hant | ≃ Hani - Simplified-only | +| Subset | Jamo | ≃ Hang − LVT - LV | +| | Hans | ≃ Hani − Traditional-only | +| | Hant | ≃ Hani − Simplified-only | -The special codes most frequently used are in the locale identifiers zh-Hans, zh-Hant, ja-Jpan, and ko-Kore. +The special codes most frequently used are in the locale identifiers `zh-Hans`, `zh-Hant`, `ja-Jpan`, and `ko-Kore`: +the first two are **Subsets**, and the last two are **Compounds**. These are used, for example, in [Likely Subtags](#Likely_Subtags) in LDML. + +The **Equivalent** values in the **Subset** variants are only approximate, _and_ the variants are also visual variants. +Thus `Hans` is a request for: +- Not using characters that are Traditional-only +- Characters common between Simplified and Traditional to be given a Simplified rendering. + +**Visual** variant script codes (that are not **Subset** variants) can be used in a locale identifier to request a particular rendering. +For example, ar_Aran could be used to request that ar_Arab data be used, but with a Nastaliq-style font. +However, the few variant script codes represent only a very small fraction of the different script variants in use. +Moreover, this feature is not widely supported, and may give unexpected results when not supported. +For example, an implmentation might not recognize `Aran` in `uz-Aran` at all, and return results for `uz-Latn`. + Some of the special codes are used in other specifications, such as in [Mixed_Script_Detection](https://unicode.org/reports/tr39/#Mixed_Script_Detection).