Skip to content

Commit

Permalink
CLDR-15611 Update tr35.md for special script codes (#4061)
Browse files Browse the repository at this point in the history
  • Loading branch information
macchiati authored Sep 20, 2024
1 parent 630504b commit 570ba9a
Showing 1 changed file with 20 additions and 7 deletions.
27 changes: 20 additions & 7 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -728,25 +728,38 @@ the Visual variants are distinct in appearance, but otherwise encompass a single
and the Subsets exclude certain characters from a script.
The Equivalents for Subsets are not as well defined, so the "Equivalents" are marked as approximate.

| Group | Script | Equivalent |
| Variant | Script | Equivalent |
| --- | --- | --- |
| Compounds | Jpan | ≡ Hani ∪ Hira ∪ Kana |
| Compound | Jpan | ≡ Hani ∪ Hira ∪ Kana |
| | Hrkt | ≡ Hira ∪ Kana |
| | Kore | ≡ Hani ∪ Hang |
| | Hanb | ≡ Hani ∪ Bopo |
| Visual variants | Aran | ≡ Arab (Nastaliq variant) |
| Visual | Aran | ≡ Arab (Nastaliq variant) |
| | Cyrs | ≡ Cyrl (Old Church Slavonic variant) |
| | Latf | ≡ Latn (Fraktur variant) |
| | Latg | ≡ Latn (Gaelic variant) |
| | Syrn | ≡ Syrc (Eastern variant) |
| | Syre | ≡ Syrc (Estrangelo variant) |
| | Syrj | ≡ Syrc (Western variant) |
| Subsets (approximate) | Jamo | ≃ Hang - LVT - LV |
| | Hans | ≃ Hani - Traditional-only |
| | Hant | ≃ Hani - Simplified-only |
| Subset | Jamo | ≃ Hang LVT - LV |
| | Hans | ≃ Hani Traditional-only |
| | Hant | ≃ Hani Simplified-only |

The special codes most frequently used are in the locale identifiers zh-Hans, zh-Hant, ja-Jpan, and ko-Kore.
The special codes most frequently used are in the locale identifiers `zh-Hans`, `zh-Hant`, `ja-Jpan`, and `ko-Kore`:
the first two are **Subsets**, and the last two are **Compounds**.
These are used, for example, in [Likely Subtags](#Likely_Subtags) in LDML.

The **Equivalent** values in the **Subset** variants are only approximate, _and_ the variants are also visual variants.
Thus `Hans` is a request for:
- Not using characters that are Traditional-only
- Characters common between Simplified and Traditional to be given a Simplified rendering.

**Visual** variant script codes (that are not **Subset** variants) can be used in a locale identifier to request a particular rendering.
For example, ar_Aran could be used to request that ar_Arab data be used, but with a Nastaliq-style font.
However, the few variant script codes represent only a very small fraction of the different script variants in use.
Moreover, this feature is not widely supported, and may give unexpected results when not supported.
For example, an implmentation might not recognize `Aran` in `uz-Aran` at all, and return results for `uz-Latn`.

Some of the special codes are used in other specifications,
such as in [Mixed_Script_Detection](https://unicode.org/reports/tr39/#Mixed_Script_Detection).

Expand Down

0 comments on commit 570ba9a

Please sign in to comment.