Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor grapheme cluster segmentation to properly act on clusters with more than 2 codepoints #47

Open
christianparpart opened this issue Nov 2, 2022 · 0 comments
Labels
enhancement New feature or request performance Performance related topic

Comments

@christianparpart
Copy link
Member

https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules

Specifically I am interested in correctly segmenting a consecutive list of country flags (RI regional indicators).

Also, to make the future implementation (but also the current one) very fast, we
should add the grapheme tokens (CR, LF, L, V, LV, LVT, Extend, ZWJ, Control, SpacingMark, Prepend, Extended_Pictographic, RI) as a field to the new codepoint_properties table to ensure grapheme segmentation is as efficient as possible.

@christianparpart christianparpart added enhancement New feature or request performance Performance related topic labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance related topic
Projects
None yet
Development

No branches or pull requests

1 participant