-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: simplify util.toUSVString #39891
util: simplify util.toUSVString #39891
Conversation
@@ -57,8 +57,7 @@ const experimentalWarnings = new SafeSet(); | |||
|
|||
const colorRegExp = /\u001b\[\d\d?m/g; // eslint-disable-line no-control-regex | |||
|
|||
const unpairedSurrogateRe = | |||
/(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/; | |||
const unpairedSurrogateRe = /\p{Surrogate}/u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slower, see #39814 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. It depends on the string you use it with, just like the current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing that discussion: the performance characteristics are different before/after this change. The trade-off is as follows:
- For strings without surrogates (the common case?), performance is equivalent.
- For strings with a lone trail surrogate at the start, the patch performs better.
- For strings with a lone lead surrogate at the end, the old/current version performs better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be pedandic:
- For strings without surrogates (the common case?), performance is equivalent.
- For strings with a lone trail surrogate at the start, the patch performs sligtly better (~5%).
- For strings with a lone lead surrogate at the end, the old/current version performs much better (~50%).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm good with either approach but I'm happy to keep things as they are if folks think that {Surrogate}
way is too slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you stick to the current approach, then consider renaming unpairedSurrogateRe
— the current name is misleading, since it doesn’t just match unpaired surrogates. E.g. in the case of '\u{10000}\uDC00'
(i.e. a surrogate pair followed by a lone surrogate), it matches '\uDC00\uDC00'
(i.e. the trail surrogate of the pair + the lone surrogate) as opposed to just the lone surrogate '\uDC00'
— which is fine for this specific use case, but could become problematic if someone were to re-use this pattern elsewhere. For example, in a user-land implementation:
function toUSVString(str) {
// This works fine:
return str.replaceAll(/\p{Surrogate}/gu, '\uFFFD');
// …but it subtly wouldn’t work correctly with the current `unpairedSurrogateRe`.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m +0. Though I think performance trumps maintainability per the priorities defined in https://github.com/nodejs/next-10/blob/main/VALUES_AND_PRIORITIZATION.md. I don’t necessarily personally agree with that but that’s the current consensus.
function toUSVString(val) { | ||
const str = `${val}`; | ||
// As of V8 5.5, `str.search()` (and `unpairedSurrogateRe[@@search]()`) are | ||
// slower than `unpairedSurrogateRe.exec()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of the meta-discussion we’re having. The problem with hardcoding micro-performance optimizations is that V8 keeps optimizing for idiomatic coding patterns over time, to the point where hand-written micro-optimizations often end up being slower than the idiomatic patterns. Unless there is some mechanism to track these hardcoded assumptions over time and adjust the code if necessary, short-term micro-optimizations end up hurting performance over the longer term.
This comment about search
vs. exec
might’ve been true for V8 v5.5, but in more recent versions the opposite seems to be the case. My latest commit switches to using StringPrototypeSearch
which makes the new version (even with the “slow” regular expression) ~21% faster:
before patch x 21,177,919 ops/sec ±0.64% (90 runs sampled)
after patch x 25,642,796 ops/sec ±0.72% (86 runs sampled)
(My colleague Peter Marshall once gave a nice presentation about this. Relevant segment: https://www.youtube.com/watch?v=1bZxs5J5n3M&t=708s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but in this specific case we are trading 50% performance for this change
/(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/
↓
/\p{Surrogate}/u
and using String.prototype.search()
instead of RegExp.prototype.exec()
does not fix the 50% performance drop.
I understand that in a future V8 version /\p{Surrogate}/u
might be faster for all inputs, but that is not the case now. It also makes backporting stuff harder.
I think we should strive for what is better now, not what might be better in future, otherwise things like this #39778 do no make any sense.
Unless there is some mechanism to track these hardcoded assumptions over time and adjust the code if necessary, short-term micro-optimizations end up hurting performance over the longer term.
This is the key and the reason why I'm not blocking this (because there is no such mechanism).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/\p{Surrogate}/u
[...] also makes backporting stuff harder.
How so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine changes like this that are good in newer V8 versions but bad in older ones. Should they be backported to old but still suppported release lines? Should "dont-land-on" labels be used? Or should we simply not care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how that applies here — we shipped \p{...}
in V8 v6.4. Even the oldest Maintenance LTS version of Node.js (12 at the time of writing) comes with a much more recent V8 version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did not make my point clear but I think you already answered my questions with "It doesn't matter if performance is worse in older V8 versions. All that matters is that the feature is supported".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about simply not backporting it? 🤔
ad9b1f5
to
4d54445
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't land this due to builds without intl
not being supported. @mathiasbynens do you have an idea how to address this?
This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open. |
Closing this because it has stalled. Feel free to reopen if this issue/PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions. |
This patch also adds some more tests for this new feature.
Follow-up to #39814.