-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(font-size): precise text length calculation #6973
core(font-size): precise text length calculation #6973
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the contribution @castilloandres!!
I wonder how far down the rabbit hole we should go. For other reviewers this article was a fun read on the topic.
Array.from("👩❤️💋👩").length === 8
Basic unicode coverage seems like a pretty good start though 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, very interesting looking into the fun and terrifying world of unicode and text encodings 😨
@@ -182,7 +182,7 @@ function getEffectiveFontRule({inlineStyle, matchedCSSRules, inherited}) { | |||
* @returns {number} | |||
*/ | |||
function getNodeTextLength(node) { | |||
return !node.nodeValue ? 0 : node.nodeValue.trim().length; | |||
return !node.nodeValue ? 0 : Array.from(node.nodeValue.trim()).length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a comment mentioning this PR or explaining why this is here. e.g.
// Use Array.from in order to more accurately count unicode characters. See: #6973
Maybe even add a TODO
for @patrickhulce's comment on future rabbit hole exploring.
@patrickhulce @exterkamp I think we should go as deep as readable text goes. Apart from emojis this should count more or less foreign character sets correctly 👍 |
@@ -182,7 +182,8 @@ function getEffectiveFontRule({inlineStyle, matchedCSSRules, inherited}) { | |||
* @returns {number} | |||
*/ | |||
function getNodeTextLength(node) { | |||
return !node.nodeValue ? 0 : node.nodeValue.trim().length; | |||
// Array.from to count JS symbols not unicode code points. See: #6973 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS Symbols => character ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoten depends on what counts as a character? I would use terms like: graphemes, symbols, code points, bytes and not "character" to avoid confusion. Here is a short video with some interesting use cases.
The "👩❤️💋👩" emoji you added is an emoji sequence [1] as defined in the Unicode spec.
[1] emoji_sequence: http://unicode.org/reports/tr51/#def_emoji_sequence So if you wanted to go deeper into the "count graphemes" rabbit hole, you could count emoji sequences as one.
Another step could be to merge characters followed by combining diacritics like
The two cases can be handled using the NPM package GraphemeSplitter pretty well: https://github.com/orling/grapheme-splitter
Counting or discounting 0-width characters like https://github.com/opentypejs/opentype.js Once the font data is loaded, each code point then has a width field that can be used to know by how much the cursor shifts to the left after having drawn the code point. |
Thanks very much for the explanation @Mickael-van-der-Beek! That's the example I pulled from the article I linked which came to similar conclusions as yours :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
𝐋𝐆𝐓𝐌
Summary
What kind of change does this PR introduce?
Count JavaScript string symbols not Unicode code points.
Is this a bugfix, feature, refactoring, build related change, etc?
Refactoring.
Describe the need for this change
To calculate text lengths on the font size gatherer with precision.
Link any documentation or information that would help understand this change