Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for rendering characters from unicode supplementary planes #4001

Open
lucaswoj opened this issue Jan 17, 2017 · 4 comments
Open
Labels
cross-platform 📺 Requires coordination with Mapbox GL Native (style specification, rendering tests, etc.) feature 🍏

Comments

@lucaswoj
Copy link
Contributor

lucaswoj commented Jan 17, 2017

Migrated from mapbox/DEPRECATED-mapbox-gl#29

@lucaswoj:
We currently only support rendering characters from the Basic Multilingual Plane. We may need to support supplementary planes as we expand into markets that use non-Latin alphabets.

https://en.wikipedia.org/wiki/CJK_Unified_Ideographs#CJK_Unified_Ideographs_Extension_E notes that 98 of those characters come from “Chinese Academy of Surveying and Mapping ideographs”, so they probably will be cropping up in OSM eventually. -- @1ec5


@1ec5:
I'm satisfied that the CJK Unified Ideographs Extension E characters will take awhile to make their way into OpenStreetMap, given that the block was only introduced to Unicode last year. However, note that the same work that goes into CJK E would also enable (colorless) emoji. Hopefully that’ll give this issue a bit more traction. 😉


@1ec5
My mistake: CJK E isn’t the only CJK block that’s in the Supplementary Ideographic Plane; CJK Unified Ideographs Extension B–D are also up there. Besides historical and Vietnamese characters, CJK B includes 1,702 characters from the Hong Kong Supplementary Character Set, which apparently includes a lot of Cantonese characters used in official Hong Kong place names.


@1ec5:
As of Unicode 9.0, the following astral-plane blocks allow ideographic breaking:

  • Meroitic Hieroglyphs
  • Egyptian Hieroglyphs
  • Anatolian Hieroglyphs
  • Ideographic Symbols and Punctuation
  • Tangut
  • Tangut Components
  • Kana Supplement
  • Tai Xuan Jing Symbols
  • Counting Rod Numerals
  • Mahjong Tiles
  • Domino Tiles
  • Playing Cards
  • Enclosed Alphanumeric Supplement
  • Enclosed Ideographic Supplement
  • Miscellaneous Symbols and Pictographs
  • Emoticons
  • Ornamental Dingbats
  • Transport and Map Symbols
  • Alchemical Symbols
  • Geometric Shapes Extended
  • Supplemental Symbols and Pictographs
  • CJK Unified Ideographs Extension B
  • CJK Unified Ideographs Extension C
  • CJK Unified Ideographs Extension D
  • CJK Unified Ideographs Extension E
  • CJK Compatibility Ideographs Supplement

As of Unicode 9.0 and revision 16 of UTR #50, the following astral-plane blocks have upright vertical orientation:

  • Meroitic Hieroglyphs
  • Siddham
  • Egyptian Hieroglyphs
  • Anatolian Hieroglyphs
  • Ideographic Symbols and Punctuation
  • Tangut
  • Tangut Components
  • Kana Supplement
  • Byzantine Musical Symbols
  • Musical Symbols
  • Tai Xuan Jing Symbols
  • Counting Rod Numerals
  • Sutton SignWriting
  • Mahjong Tiles
  • Domino Tiles
  • Playing Cards
  • Enclosed Alphanumeric Supplement
  • Enclosed Ideographic Supplement
  • Miscellaneous Symbols and Pictographs
  • Emoticons
  • Ornamental Dingbats
  • Transport and Map Symbols
  • Alchemical Symbols
  • Geometric Shapes Extended
  • Supplemental Symbols and Pictographs
  • CJK Unified Ideographs Extension B
  • CJK Unified Ideographs Extension C
  • CJK Unified Ideographs Extension D
  • CJK Unified Ideographs Extension E
  • CJK Compatibility Ideographs Supplement

The following astral-plane blocks have neutral vertical orientation:

  • Supplementary Private Use Area-A
  • Supplementary Private Use Area-B
@1ec5
Copy link
Contributor

1ec5 commented Jun 30, 2017

As of Unicode 10.0, the following astral-plane blocks also allow ideographic breaking:

  • Nushu
  • CJK Unified Ideographs Extension F

Revision 17 of UTR #50 still reflects Unicode 9, but presumably the following astral-plane blocks also allow upright vertical orientation:

  • Kana Extended-A
  • Nushu
  • CJK Unified Ideographs Extension F

/cc @ChrisLoer

@1ec5
Copy link
Contributor

1ec5 commented Jun 30, 2017

OpenStreetMap does have CJK Unified Ideographs B–F characters in a number of features’ name or name:zh tags, which wind up in the Mapbox Streets source’s {name} or {name_zh} fields, respectively:

  • 🇨🇳 China (+): 1 primary road, 1 mountain peak, 1 hamlet, 1 college, 1 school, 1 hospital
  • 🇯🇵 Japan: 1 restaurant, 1 fast food restaurant, 1 pawn shop, 1 statue, 3 buildings, 1 pond
    • also 1 Shinto shrine excluded from Streets
  • 🇹🇼 Taiwan: 1 river, 1 village, 2 hamlets, 1 tourist attraction, 3 restaurants, 1 café
    • also 1 admin_level=9 boundary relation excluded from Streets
  • 🇭🇰 Hong Kong: 1 island, 1 locality, 1 gate
  • 🇻🇳 Vietnam: 1 province, 1 provincial capital city
    • also 12 admin_level=6 boundary relations excluded from Streets

There are also plenty of Cantonese place names are in name:zh-yue tags, but the Mapbox Streets source omits them because it lacks dedicated support for Cantonese.

GL JS skips over any supplementary-plane character, rather than leaving a space or replacement character. For example “卡司𥰆拉樂園” is rendered as “卡司拉樂園”, even with the demo in #4895. In principle, this could lead to some unfortunate labels.

/cc @ajashton @jcsg

@1ec5
Copy link
Contributor

1ec5 commented Dec 22, 2018

The analysis in #4001 (comment) mostly covered Chinese labels. Since then, Mapbox Streets has added support for Japanese names. Currently, OpenStreetMap has 11 features in Japan with unsupported characters in Japanese names: 3 buildings, 2 restaurants, 1 pond, 1 memorial, 1 shrine, and 1 supermarket.

@1ec5
Copy link
Contributor

1ec5 commented Apr 16, 2019

As of Unicode 12.1, the following astral-plane blocks also allow ideographic breaking:

  • Egyptian Hieroglyph Format Controls
  • Small Kana Extension
  • Symbols and Pictographs Extended-A

The following astral-plane blocks have upright vertical orientation:

  • Small Kana Extension
  • Symbols and Pictographs Extended-A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cross-platform 📺 Requires coordination with Mapbox GL Native (style specification, rendering tests, etc.) feature 🍏
Projects
None yet
Development

No branches or pull requests

2 participants