Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: handle string placeholder collisions #14432

Merged
merged 2 commits into from
Oct 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions core/audits/accessibility/aria-meter-name.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ const UIStrings = {
title: 'ARIA `meter` elements have accessible names',
/** Title of an accessibility audit that evaluates if meter HTML elements do not have accessible names. This title is descriptive of the failing state and is shown to users when there is a failure that needs to be addressed. */
failureTitle: 'ARIA `meter` elements do not have accessible names.',
/** Description of a Lighthouse audit that tells the user *why* they should have accessible names for HTML elements. This is displayed after a user expands the section to see more. No character length limits. 'Learn More' becomes link text to additional documentation. */
description: 'When an element doesn\'t have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).',
/** Description of a Lighthouse audit that tells the user *why* they should have accessible names for HTML 'meter' elements. This is displayed after a user expands the section to see more. No character length limits. 'Learn how...' becomes link text to additional documentation. */
description: 'When a meter element doesn\'t have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).',
};

const str_ = i18n.createIcuMessageFn(import.meta.url, UIStrings);
Expand Down
4 changes: 2 additions & 2 deletions core/audits/accessibility/aria-tooltip-name.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ const UIStrings = {
title: 'ARIA `tooltip` elements have accessible names',
/** Title of an accessibility audit that evaluates if tooltip HTML elements do not have accessible names. This title is descriptive of the failing state and is shown to users when there is a failure that needs to be addressed. */
failureTitle: 'ARIA `tooltip` elements do not have accessible names.',
/** Description of a Lighthouse audit that tells the user *why* they should have accessible names for HTML elements. This is displayed after a user expands the section to see more. No character length limits. 'Learn More' becomes link text to additional documentation. */
description: 'When an element doesn\'t have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).',
/** Description of a Lighthouse audit that tells the user *why* they should have accessible names for HTML 'tooltip' elements. This is displayed after a user expands the section to see more. No character length limits. 'Learn how...' becomes link text to additional documentation. */
description: 'When a tooltip element doesn\'t have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mark tooltip/meter with backticks?

Copy link
Member Author

@brendankenny brendankenny Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took that from the axe docs (e.g. "Aria tooltip elements must have discernible text..."), though to be fair, their docs language selector has only one other language and that particular string stays in english :)

I kind of justified it in my head as it's an aria role, but also a description of the type of element, so it makes some sense to include a localized name. tooltip (and meter for aria-meter-name) are backticked later in the string 🤷

edit: I could make the second reference more explicitly the markup version, Learn how to name `role="tooltip" ` elements

};

const str_ = i18n.createIcuMessageFn(import.meta.url, UIStrings);
Expand Down
27 changes: 22 additions & 5 deletions core/lib/i18n/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ The collection and translation pipeline:
```
Source files: Locale files:
+---------------------------+ +----------------------------------------------
| ++ | shared/localization/locales/en-US.json |
| const UIStrings = { ... };|-+ +---> | shared/localization/locales/en-XL.json |
| ++ | shared/localization/locales/en-US.json |
| const UIStrings = { ... };|-+ +---> | shared/localization/locales/en-XL.json |
| |-| | +----------------------------------------------+
+-----------------------------| | | ||
+----------------------------| | | shared/localization/locales/*.json |-<+
+----------------------------| | | shared/localization/locales/*.json |-<+
+---------------------------+ | | || |
| | +----------------------------------------------| |
$ yarn | | +---------------------------------------------+ |
Expand Down Expand Up @@ -198,11 +198,11 @@ CTC is a name that is distinct and identifies this as the Chrome translation for
```json
{
"name": {
"message": "Message text, with optional placeholders.",
"message": "Message text, with optional placeholders, which can be $PLACEHOLDER_TEXT$",
"description": "Translator-aimed description of the message.",
"meaning": "Description given when a message is duplicated, in order to give context to the message. Lighthouse uses a copy of the description for this.",
"placeholders": {
"placeholder_name": {
"PLACEHOLDER_TEXT": {
"content": "A string to be placed within the message.",
"example": "Translator-aimed example of the placeholder string."
},
Expand All @@ -211,6 +211,23 @@ CTC is a name that is distinct and identifies this as the Chrome translation for
}
```

### Collisions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

history has shown we pretty reliably forget this information over time, so please give any feedback on ambiguity so we can clear it up now :)

Collisions happen when two CTC messages have the same `message`. For Lighthouse, there are two relevant collision types in TC:
- Allowed: the CTC `message`, `description`, and `placeholders` are exactly the same. These collisions are deduped on the TC side and the translation cost is the same as for a single string.
- Disallowed: `message` is the same but one or more of the other properties differ.

When the `message` needs to be the same as another string but another property must differ, that disallowed collision can be fixed by adding a unique `meaning` property to each colliding CTC message. TC will then consider those as separate strings and not a collision.

In Lighthouse, this is done by having a different `description` for the strings, which is then copied to `meaning` in `resolveMessageCollisions()`. `meaning` cannot be manually set.

For instance, the string "Potential Savings" currently refers to both saved KiB and saved milliseconds in different audits. The string is defined twice, each with a different `description` describing the units being saved, in case some locales' translations will use a different word choice depending on the unit.

Internally, TC uses a message ID that's a hash of `message` and `meaning` to check for collisions. Somewhat confusingly, if two messages do have colliding IDs, then `message`, `meaning`, `description`, and `placeholders` are all required to match or an error is thrown. This is why all message properties could cause a collision but `meaning` is the only way to dedupe them.

We treat it as an error if `placeholders` differ between messages in a collision: if there is a need for placeholders to differ, then the strings aren't really the same, and at least the `description` should be changed to explain that context. Placeholders must match in user-controlled data (e.g. if a placeholder has an `@example`, it must be the same example in all instances) and in Lighthouse-controlled data (e.g. the token used to replace it in the CTC `message`, like `$PLACEHOLDER_TEXT$` in the example above).

Finally, identical messages made to not collide by Lighthouse with a `meaning` cost real money and shouldn't be confused with allowed collisions which cost nothing for each additional collision. Fixed collisions are checked against a known list to add a little friction and motivate keeping them few in number. An error is thrown if a collision is fixed that hasn't yet been added to that list.

# Appendix

## Appendix A: How runtime string replacement works
Expand Down
94 changes: 72 additions & 22 deletions core/scripts/i18n/collect-strings.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import {expect} from 'expect';
import tsc from 'typescript';
import MessageParser from 'intl-messageformat-parser';
import esMain from 'es-main';
import isDeepEqual from 'lodash/isEqual.js';

import {Util} from '../../util.cjs';
import {collectAndBakeCtcStrings} from './bake-ctc-to-lhl.js';
Expand Down Expand Up @@ -613,43 +614,90 @@ function writeStringsToCtcFiles(locale, strings) {
}

/**
* This function does two things:
*
* - Add `meaning` property to ctc messages that have the same message but different descriptions so TC can disambiguate.
* - Throw if the known collisions has changed at all.
* TODO: replace by Array.prototype.groupToMap when available.
* @template {unknown} T
* @template {string} U
* @param {Array<T>} elements
* @param {(element: T) => U} callback
* @return {Map<U, Array<T>>}
*/
function arrayGroupToMap(elements, callback) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to do this twice (nested) and was getting kind of gross. It's a nice API (for Node 20 or whatever)

/** @type {Map<U, Array<T>>} */
const elementsByValue = new Map();

for (const element of elements) {
const key = callback(element);
const group = elementsByValue.get(key) || [];
group.push(element);
elementsByValue.set(key, group);
}

return elementsByValue;
}

/**
* Returns whether all the placeholders in the given CtcMessages match.
* @param {Array<{ctc: CtcMessage}>} strings
* @return {boolean}
*/
function doPlaceholdersMatch(strings) {
// Technically placeholder `content` is not required to match by TC, but since
// `example` must match and any auto-generated `example` is copied from `content`,
// it would be confusing to let it differ when `example` is explicit.
return strings.every(val => isDeepEqual(val.ctc.placeholders, strings[0].ctc.placeholders));
}

/**
* If two or more messages have the same `message`:
* - if `description`, and `placeholders` are also the same, TC treats them as a
* single message, the collision is allowed, and no change is made.
* - if `description` is unique for the collision members, the `description` is
* copied to `meaning` to differentiate them for TC. They must be added to the
* `collidingMessages` list for this case as a nudge to not be wasteful.
* - if `description` is the same but `placeholders` differ, an error is thrown.
* Either `placeholders` should be the same, or `message` or `description`
* should be changed to explain the need for different `placeholders`.
*
* Modifies `strings` in place to fix collisions.
* @see https://github.com/GoogleChrome/lighthouse/blob/main/core/lib/i18n/README.md#collisions
* @param {Record<string, CtcMessage>} strings
*/
function resolveMessageCollisions(strings) {
/** @type {Map<string, Array<CtcMessage>>} */
const stringsByMessage = new Map();

// Group all the strings by their message.
for (const ctc of Object.values(strings)) {
const collisions = stringsByMessage.get(ctc.message) || [];
collisions.push(ctc);
stringsByMessage.set(ctc.message, collisions);
}
const entries = Object.entries(strings).map(([key, ctc]) => ({key, ctc}));

/** @type {Array<CtcMessage>} */
const allCollisions = [];
const stringsByMessage = arrayGroupToMap(entries, (entry) => entry.ctc.message);
for (const messageGroup of stringsByMessage.values()) {
// If this message didn't collide with anything else, skip it.
if (messageGroup.length <= 1) continue;

// If group shares both message and description, they can be translated as if a single string.
const descriptions = new Set(messageGroup.map(ctc => ctc.description));
if (descriptions.size <= 1) continue;
// For each group of matching message+description, placeholders must also match.
const stringsByDescription = arrayGroupToMap(messageGroup, (entry) => entry.ctc.description);
for (const descriptionGroup of stringsByDescription.values()) {
if (!doPlaceholdersMatch(descriptionGroup)) {
throw new Error('string collision: `message` and `description` are the same but differ in `placeholders`:\n' +
descriptionGroup.map(entry => `key: ${entry.key}\n`).join('') +
'make `placeholders` match or update `message` or `description`s to explain why they don\'t match');
}
}

// If the entire message group has the same description and placeholders,
// they can be translated as if a single message, no meaning needed.
if (stringsByDescription.size <= 1) continue;

// We have duplicate messages with different descriptions. Disambiguate using `meaning` for TC.
for (const ctc of messageGroup) {
for (const {ctc} of messageGroup) {
ctc.meaning = ctc.description;
}
allCollisions.push(...messageGroup);
}
}

// Check that the known collisions match our known list.
const collidingMessages = allCollisions.map(collision => collision.message).sort();
/**
* Check that fixed collisions match our known list.
* @param {Record<string, CtcMessage>} strings
*/
function checkKnownFixedCollisions(strings) {
const fixedCollisions = Object.values(strings).filter(string => string.meaning);
const collidingMessages = fixedCollisions.map(collision => collision.message).sort();

try {
expect(collidingMessages).toEqual([
Expand Down Expand Up @@ -696,6 +744,7 @@ async function main() {
}

resolveMessageCollisions(strings);
checkKnownFixedCollisions(strings);

writeStringsToCtcFiles('en-US', strings);
console.log('Written to disk!', 'en-US.ctc.json');
Expand Down Expand Up @@ -734,4 +783,5 @@ export {
parseUIStrings,
createPsuedoLocaleStrings,
convertMessageToCtc,
resolveMessageCollisions,
};
Original file line number Diff line number Diff line change
Expand Up @@ -2532,7 +2532,7 @@
"aria-meter-name": {
"id": "aria-meter-name",
"title": "ARIA `meter` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"description": "When a meter element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down Expand Up @@ -2591,7 +2591,7 @@
"aria-tooltip-name": {
"id": "aria-tooltip-name",
"title": "ARIA `tooltip` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"description": "When a tooltip element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down Expand Up @@ -11172,7 +11172,7 @@
"aria-meter-name": {
"id": "aria-meter-name",
"title": "ARIA `meter` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"description": "When a meter element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down Expand Up @@ -11231,7 +11231,7 @@
"aria-tooltip-name": {
"id": "aria-tooltip-name",
"title": "ARIA `tooltip` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"description": "When a tooltip element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down Expand Up @@ -16505,7 +16505,7 @@
"aria-meter-name": {
"id": "aria-meter-name",
"title": "ARIA `meter` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"description": "When a meter element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down Expand Up @@ -16564,7 +16564,7 @@
"aria-tooltip-name": {
"id": "aria-tooltip-name",
"title": "ARIA `tooltip` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"description": "When a tooltip element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down
4 changes: 2 additions & 2 deletions core/test/results/sample_v2.json
Original file line number Diff line number Diff line change
Expand Up @@ -3442,7 +3442,7 @@
"aria-meter-name": {
"id": "aria-meter-name",
"title": "ARIA `meter` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"description": "When a meter element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `meter` elements](https://dequeuniversity.com/rules/axe/4.4/aria-meter-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down Expand Up @@ -3501,7 +3501,7 @@
"aria-tooltip-name": {
"id": "aria-tooltip-name",
"title": "ARIA `tooltip` elements have accessible names",
"description": "When an element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"description": "When a tooltip element doesn't have an accessible name, screen readers announce it with a generic name, making it unusable for users who rely on screen readers. [Learn how to name `tooltip` elements](https://dequeuniversity.com/rules/axe/4.4/aria-tooltip-name).",
"score": null,
"scoreDisplayMode": "notApplicable"
},
Expand Down
Loading