-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i18n: upgrade to latest icu formatter #13834
Conversation
@@ -4516,13 +4555,23 @@ intl-messageformat-parser@^1.8.1: | |||
resolved "https://registry.yarnpkg.com/intl-messageformat-parser/-/intl-messageformat-parser-1.8.1.tgz#0eb14c5618333be4c95c409457b66c8c33ddcc01" | |||
integrity sha512-IMSCKVf0USrM/959vj3xac7s8f87sc+80Y/ipBzdKy4ifBv5Gsj2tZ41EAaURVg01QU71fYr77uA8Meh6kELbg== | |||
|
|||
intl-messageformat@^4.1.2, intl-messageformat@^4.4.0: | |||
intl-messageformat@^4.1.2: | |||
version "4.4.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub ads has this old version as a dependency ... I think we could move it to their devDeps?
} catch (err) { | ||
if (err.name !== 'SyntaxError') throw err; | ||
// Improve the intl-messageformat-parser syntax error output. | ||
/** @type {Array<{text: string}>} */ | ||
const expected = err.expected; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't exist any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error messages definitely seem less useful (at least looking at the test changes). Is there any kind of equivalent to replace this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a "location" field with begin/end markers, could use that to point to the specific characters that are problematic.
Can't promise fancy ascii pointers like you'd see elsewhere though :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I've looked at updating intl-messageformat
in the past, I've ended up holding off because
- it's bigger
- our strings would need updating
- nothing's broken
it seems like these things are all still true, so it seems like it's still worth putting off?
The ignoreTag
option looks like it would let us dodge most of the escaping pain and just deal with escaping apostrophes, however:
- from the built files, it looks like the new version will add 18KiB to viewer and 19KiB to flow.js. If/when we want message formatting in the regular report, it'll also be included there.
- the advantages of updating don't seem that important to us. Memoized formatters are actually supported in the version we use, and while we won't have skeleton support, our current format style does support mapping
style
names toIntl.NumberFormat
options (e.g.extendedPercent
usesstyle: 'percent'
), so we can get the same thing, just without the auto-parsing from skeletons to formatting options. We can expand our formatting types (so if we wanted anarrowTenthsOfAKibibit
we could do it :)
Finally, Intl.MessageFormat
is now stage 1 in TC39 and should eventually be included without a library. It's using the (in progress) Unicode MessageFormat 2.0 syntax, though, so there's probably not a gradual migration path there.
} catch (err) { | ||
if (err.name !== 'SyntaxError') throw err; | ||
// Improve the intl-messageformat-parser syntax error output. | ||
/** @type {Array<{text: string}>} */ | ||
const expected = err.expected; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error messages definitely seem less useful (at least looking at the test changes). Is there any kind of equivalent to replace this?
|
||
for (const option of Object.values(element.options)) { | ||
validate(option.value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For lhl validation purposes, It's possible testing that the actual substrings were valid lhlMessages may have been intentional? option.value.elements
was available in the old version of intl-messageformat-parser
as well. I think this was checking that it would be TC compatible, not just intl-messageformat
compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the same? I think that part of the grammar is pretty recursive (as in, if this lib can parse it into a value, and that field is equivalent in structure to the root field, then the raw string should be valid as a string we'd send to TC?). I'll try to inspect the original PR and see if it was brought up.
FWIW, you can see the recursive test case failing if you comment this line out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily the same. For instance what's checked right above here: the ICU docs recommend that there be no content outside of plurals, but TC requires that there not be. Unfortunately everyone uses their own subset of the ICU/message format syntax.
The original PR doesn't list it's reasoning :( However, the fact that I wrote the "Each option value must also be a valid lhlMessage" comment above it to me means either there was a known mismatch, or the invariant is just what that comment describes and we should just check it directly rather than hoping there are no corner cases where they differ. Since this is in collect-strings
, not in a lighthouse run or the report, seems reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option value is really an array of messages. if one of the messages is a plural, then the validate function checks that it is the only entry (array length is 1 === no content outside of it). It seems like we're checking that correctly, and there's no failing test to say we aren't.
I do preferred this structured approach to reparsing the options... but we could do both just in case?
What of the desire to drop the report/renderer/i18n.js formatter file and localize our units in a potentially more consistent manner? Granted we have an unresolved discussion regarding this from #13830, and should probably attempt an analysis of our strings to determine if/how often implicit display widths are respected (ie do we ever get a language's equivalent of I'm having a difficult time comparing the latest version capabilities with the newest (and the CHANGELOG was kinda of terse), are there versioned docs? Are you perhaps looking at docs packaged in node_modules? I don't find anything to disagree with out about on the individual points you've mentioned, but to me I feel that using such an old version of a package core to our i18n story is something we should resolve.
gzip or disk?
Does / will our translation pipeline support this syntax? Probably too early to tell, but yeah that'd be great if we didn't need any library, let's tag that LH 15.0 :P |
oh i see, we can do this today by configuring the formatters. skeleton syntax is just shorthand for all that. |
Would be nice to get this merged in order to get rid of this npm deprecation warning: |
I also ran into this.
|
I just wanted to bump this because I ran into a related issue again. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
2ad71aa
to
81b2330
Compare
(Oct 2023)
Updated to
intl-messageformat
to 10.5.3 and@formatjs/icu-messageformat-parser
to 2.6.2.(original message)
This is a big jump:
intl-messageformat
: 4.4.0 -> 9.12.0 changelog@formatjs/icu-messageformat-parser
: (1.8.1 of old package) -> 2.0.19 changelog (the entire changelog is our diff b/c new package)The new parser/ICU spec has made some of our strings invalid because:
To resolve this I replaced some apostrophes with backticks (they are technical keywords so they shouldn't be translated anyway), and added a regex to preprocess our message strings before parsing them (so we don't have to go escape-crazy). This solution should work fine up until the day we actually want to use this new syntax.
The parser emits a more easily consumable AST, ex: nested nodes in select/plural nodes are nodes themselves (I think before they were just ranges specifying start/end)
Some stuff we might like to try out: