i18n: reduce unnecessary message formats #14030
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
alternate approach to memoization in #14026
Memoization needs to be on the
icuMessageFn
and the ICU replacement values, since messages with different replacements are going to end up different as different strings in the end. One way to avoid the extra arg bookkeeping work would be to memoize only strings with no replacements, which is most of the strings (especially during config generation, where it's mostly checking audittitle
,failureTitle
,description
, etc.The root performance problem is that
IntlMessageFormat
is relatively slow, and we call it a lot. If you look at a profile, it's spending almost all its time parsing each message (which to be fair, is apparently much faster in more recentintl-messageformat
versions and I'm the one who has resisted updating).Since ICU placeholders have to occur inside
{}
, and most of our strings don't have any replacements, this PR instead adds a simple filter that just checks if the message has a{
in it before attempting to parse it. This will catch all messages with placeholders (letting in false positives with escaped{
in them, but I don't believe we have any right now?). The result is equivalent to the "memoize only strings with no replacements" mentioned above.This is slightly less efficient than memoizing the whole thing, but almost all formatted strings have no placeholders, and of those that do, only a fraction will have repeated placeholder replacements that would get a cache hit.
75 out of 880 messages (8.5%) in
en-US.json
have ICU placeholders in them, but almost all of those are things likedisplayValue
s and the like, which are used rarely. In a real run using the command in #14026of the 16,608 messages that pass through
getIcuMessageFn
, 42 (0.25%) need a replacement, so it's probably not worth the extra effort for them.results of
which look roughly equivalent to the #14026 improvements to me