Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: correctly truncate unicode strings #14911

Merged
merged 12 commits into from
Mar 22, 2023
4 changes: 2 additions & 2 deletions cli/test/smokehouse/test-definitions/byte-efficiency.js
Original file line number Diff line number Diff line change
Expand Up @@ -192,12 +192,12 @@ const expectations = {
},
{
// /some-custom-url.js,
url: 'inline: \n function unusedFunction() {\n // Un...',
url: 'inline: \n function unusedFunction() {\n // U…',
wastedBytes: '6690 +/- 100',
wastedPercent: '99.6 +/- 0.1',
},
{
url: 'inline: \n // Used block #1\n // FILLER DATA JUS...',
url: 'inline: \n // Used block #1\n // FILLER DATA JU…',
wastedBytes: '6569 +/- 100',
wastedPercent: 100,
},
Expand Down
2 changes: 1 addition & 1 deletion core/audits/audit.js
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ class Audit {
const lineNumber = lineIndex + 1;
/** @type LH.Audit.Details.SnippetValue['lines'][0] */
const lineDetail = {
content: line.slice(0, maxLineLength),
content: Util.truncate(line, maxLineLength),
lineNumber,
};
if (line.length > maxLineLength) {
Expand Down
3 changes: 2 additions & 1 deletion core/audits/byte-efficiency/unminified-javascript.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import {ByteEfficiencyAudit} from './byte-efficiency-audit.js';
import * as i18n from '../../lib/i18n/i18n.js';
import {computeJSTokenLength as computeTokenLength} from '../../lib/minification-estimator.js';
import {getRequestForScript, isInline} from '../../lib/script-helpers.js';
import { Util } from '../../../shared/util.js';
connorjclark marked this conversation as resolved.
Show resolved Hide resolved

const UIStrings = {
/** Imperative title of a Lighthouse audit that tells the user to minify the page’s JS code to reduce file size. This is displayed in a list of audit titles that Lighthouse generates. */
Expand Down Expand Up @@ -84,7 +85,7 @@ class UnminifiedJavaScript extends ByteEfficiencyAudit {
const networkRecord = getRequestForScript(networkRecords, script);

const displayUrl = isInline(script) ?
`inline: ${script.content.substring(0, 40)}...` :
`inline: ${Util.truncate(script.content, 40)}` :
script.url;
try {
const result = UnminifiedJavaScript.computeWaste(script.content, displayUrl, networkRecord);
Expand Down
12 changes: 6 additions & 6 deletions core/computed/unused-css.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import {makeComputedArtifact} from './computed-artifact.js';
import {ByteEfficiencyAudit} from '../audits/byte-efficiency/byte-efficiency-audit.js';
import {NetworkRecords} from './network-records.js';
import {Util} from '../../shared/util.js';

const PREVIEW_LENGTH = 100;

Expand Down Expand Up @@ -87,8 +88,7 @@ class UnusedCSS {
* @return {string}
*/
static determineContentPreview(content) {
let preview = (content || '')
.slice(0, PREVIEW_LENGTH * 5)
let preview = Util.truncate(content || '', PREVIEW_LENGTH * 5, '')
.replace(/( {2,}|\t)+/g, ' ') // remove leading indentation if present
.replace(/\n\s+}/g, '\n}') // completely remove indentation of closing braces
.trim(); // trim the leading whitespace
Expand All @@ -101,16 +101,16 @@ class UnusedCSS {
firstRuleStart > firstRuleEnd ||
firstRuleStart > PREVIEW_LENGTH) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if PREVIEW_LENGTH is in graphemes, it won't be comparable to preview.length, firstRuleStart, firstRuleEnd, etc. Not sure how much we care, but definitely some wonkiness in here as a result (e.g. firstRuleStart could be greater than PREVIEW_LENGTH in code points but less than PREVIEW_LENGTH in graphemes), which makes the code harder to understand/alter in the future.

// We couldn't determine the first rule-set or it's not within the preview
preview = preview.slice(0, PREVIEW_LENGTH) + '...';
preview = Util.truncate(preview, PREVIEW_LENGTH);
} else if (firstRuleEnd < PREVIEW_LENGTH) {
// The entire first rule-set fits within the preview
preview = preview.slice(0, firstRuleEnd + 1) + ' ...';
preview = preview.slice(0, firstRuleEnd + 1) + ' ';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Util.truncate(0, firstRuleEnd, ' …') here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because firstRuleEnd is a specific location in the string, we should not truncate by characters here.

} else {
// The first rule-set doesn't fit within the preview, just show as many as we can
const lastSemicolonIndex = preview.slice(0, PREVIEW_LENGTH).lastIndexOf(';');
preview = lastSemicolonIndex < firstRuleStart ?
preview.slice(0, PREVIEW_LENGTH) + '... } ...' :
preview.slice(0, lastSemicolonIndex + 1) + ' ... } ...';
preview.slice(0, PREVIEW_LENGTH) + ' } ' :
connorjclark marked this conversation as resolved.
Show resolved Hide resolved
preview.slice(0, lastSemicolonIndex + 1) + ' } ';
}
}

Expand Down
4 changes: 2 additions & 2 deletions core/lib/lh-error.js
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,14 @@ const str_ = i18n.createIcuMessageFn(import.meta.url, UIStrings);
const LHERROR_SENTINEL = '__LighthouseErrorSentinel';
const ERROR_SENTINEL = '__ErrorSentinel';
/**
* @typedef {{sentinel: '__LighthouseErrorSentinel', code: string, stack?: string, [p: string]: string|undefined}} SerializedLighthouseError
* @typedef {{sentinel: '__LighthouseErrorSentinel', code: string, stack?: string, [p: string]: unknown}} SerializedLighthouseError
* @typedef {{sentinel: '__ErrorSentinel', message: string, code?: string, stack?: string}} SerializedBaseError
*/

class LighthouseError extends Error {
/**
* @param {LighthouseErrorDefinition} errorDefinition
* @param {Record<string, string|undefined>=} properties
* @param {Record<string, unknown>=} properties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially I needed this change when the base tsconfig target was updated to 2022, reverted in de6a394 . the base tsconfig didn't need the change anymore after I moved the truncate function to shared/util.js (it was in page-functions.js). Might as well keep this more correct type, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a tsc bug that spreading unknown down on L125 doesn't result in an error, though. properties are ICU placeholder values and the constructor shouldn't allow anything but strings through if at all possible, so ideally we can hold off on using unknown here unless it's absolutely required.

*/
constructor(errorDefinition, properties) {
super(errorDefinition.code);
Expand Down
54 changes: 32 additions & 22 deletions core/lib/page-functions.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/

import {Util} from '../../shared/util.js';

/**
* @fileoverview
* Helper functions that are passed by `toString()` by Driver to be evaluated in target page.
Expand Down Expand Up @@ -78,6 +80,14 @@ function getElementsInDocument(selector) {
return results;
}

/**
* @param {string} string
* @param {number} characterLimit
*/
function truncate(string, characterLimit) {
return Util.truncate(string, characterLimit);
connorjclark marked this conversation as resolved.
Show resolved Hide resolved
}

/**
* Gets the opening tag text of the given node.
* @param {Element|ShadowRoot} element
Expand Down Expand Up @@ -132,10 +142,9 @@ function getOuterHTMLSnippet(element, ignoreAttrs = [], snippetCharacterLimit =
}

// Elide attribute value if too long.
if (attributeValue.length > ATTRIBUTE_CHAR_LIMIT) {
attributeValue = attributeValue.slice(0, ATTRIBUTE_CHAR_LIMIT - 1) + '…';
dirty = true;
}
const truncatedString = truncate(attributeValue, ATTRIBUTE_CHAR_LIMIT);
dirty ||= truncatedString !== attributeValue;
connorjclark marked this conversation as resolved.
Show resolved Hide resolved
attributeValue = truncatedString;

if (dirty) {
// Style attributes can be blocked by the CSP if they are set via `setAttribute`.
Expand Down Expand Up @@ -372,22 +381,6 @@ function isPositionFixed(element) {
* @return {string | null}
*/
function getNodeLabel(element) {
// Inline so that audits that import getNodeLabel don't
// also need to import truncate
/**
* @param {string} str
* @param {number} maxLength
* @return {string}
*/
function truncate(str, maxLength) {
if (str.length <= maxLength) {
return str;
}
// Take advantage of string iterator multi-byte character awareness.
// Regular `.slice` will ignore unicode character boundaries and lead to malformed text.
return Array.from(str).slice(0, maxLength - 1).join('') + '…';
}

const tagName = element.tagName.toLowerCase();
// html and body content is too broad to be useful, since they contain all page content
if (tagName !== 'html' && tagName !== 'body') {
Expand Down Expand Up @@ -518,14 +511,30 @@ function getNodeDetails(element) {
return details;
}

/** @type {string} */
const getNodeLabelRawString = getNodeLabel.toString();
getNodeLabel.toString = () => `function getNodeLabel(element) {
${truncate};
return (${getNodeLabelRawString})(element);
}`;

/** @type {string} */
const getOuterHTMLSnippetRawString = getOuterHTMLSnippet.toString();
// eslint-disable-next-line max-len
getOuterHTMLSnippet.toString = () => `function getOuterHTMLSnippet(element, ignoreAttrs = [], snippetCharacterLimit = 500) {
${truncate};
return (${getOuterHTMLSnippetRawString})(element, ignoreAttrs, snippetCharacterLimit);
}`;

/** @type {string} */
const getNodeDetailsRawString = getNodeDetails.toString();
getNodeDetails.toString = () => `function getNodeDetails(element) {
${truncate}
${getNodePath};
${getNodeSelector};
${getBoundingClientRect};
${getOuterHTMLSnippet};
${getNodeLabel};
${getOuterHTMLSnippetRawString};
${getNodeLabelRawString};
return (${getNodeDetailsRawString})(element);
}`;

Expand All @@ -541,4 +550,5 @@ export const pageFunctions = {
isPositionFixed,
wrapRequestIdleCallback,
getBoundingClientRect,
truncate,
};
2 changes: 1 addition & 1 deletion core/lib/url-utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ class UrlUtils {
static elideDataURI(url) {
try {
const parsed = new URL(url);
return parsed.protocol === 'data:' ? url.slice(0, 100) : url;
return parsed.protocol === 'data:' ? Util.truncate(url, 100) : url;
Copy link
Member

@brendankenny brendankenny Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW data URLs can only contain ascii characters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but at least inclusion of the ellipse suffix is an improvement here.

} catch (e) {
return url;
}
Expand Down
10 changes: 6 additions & 4 deletions core/test/computed/unused-css-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

import assert from 'assert/strict';

import {expect} from 'expect';

import {UnusedCSS} from '../../computed/unused-css.js';

describe('UnusedCSS computed artifact', () => {
Expand All @@ -22,7 +24,7 @@ describe('UnusedCSS computed artifact', () => {
describe('#determineContentPreview', () => {
function assertLinesContained(actual, expected) {
expected.split('\n').forEach(line => {
assert.ok(actual.includes(line.trim()), `${line} is found in preview`);
expect(actual).toContain(line.trim());
});
}

Expand Down Expand Up @@ -50,7 +52,7 @@ describe('UnusedCSS computed artifact', () => {
assertLinesContained(preview(longContent), `
body {
color: white;
} ...
}
`.trim());
});

Expand All @@ -66,7 +68,7 @@ describe('UnusedCSS computed artifact', () => {
assertLinesContained(preview(longContent), `
body {
color: white;
font-size: 20px; ... } ...
font-size: 20px; }
`.trim());
});

Expand All @@ -77,7 +79,7 @@ describe('UnusedCSS computed artifact', () => {
*/
`.trim();

assert.ok(/aaa\.\.\./.test(preview(longContent)));
assert.ok(/aaa/.test(preview(longContent)));
});
});

Expand Down
20 changes: 20 additions & 0 deletions shared/test/util-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -211,4 +211,24 @@ describe('util helpers', () => {
]);
});
});

describe('truncate', () => {
it('truncates based on visual characters', () => {
expect(Util.truncate('aaa', 30)).toEqual('aaa');
expect(Util.truncate('aaa', 3)).toEqual('aaa');
expect(Util.truncate('aaa', 2)).toEqual('a…');
expect(Util.truncate('aaa🥳', 4)).toEqual('aaa🥳');
expect(Util.truncate('aaa🥳', 3)).toEqual('aa…');
expect(Util.truncate('देवनागरी', 5)).toEqual('देवनागरी');
connorjclark marked this conversation as resolved.
Show resolved Hide resolved
expect(Util.truncate('देवनागरी', 4)).toEqual('देवना…');

expect(Util.truncate('aaa', 3, '')).toEqual('aaa');
expect(Util.truncate('aaa', 2, '')).toEqual('aa');

expect(Util.truncate('aaaaa', 5, '...')).toEqual('aaaaa');
expect(Util.truncate('aaaaa', 4, '...')).toEqual('a...');
expect(Util.truncate('aaaaa', 3, '...')).toEqual('...');
expect(Util.truncate('aaaaa', 1, '...')).toEqual('...');
});
});
});
2 changes: 1 addition & 1 deletion shared/tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"extends": "../tsconfig-base.json",
"compilerOptions": {
// Limit defs to base JS.
"lib": ["es2020"],
"lib": ["es2020", "es2022.intl"],
// Only include `@types/node` from node_modules/.
"types": ["node"],
// "listFiles": true,
Expand Down
36 changes: 36 additions & 0 deletions shared/util.js
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,42 @@ class Util {
return segments;
}

/**
* @param {string} string
* @param {number} characterLimit
* @param {string} ellipseSuffix
*/
static truncate(string, characterLimit, ellipseSuffix = '…') {
if (string.length <= characterLimit) {
connorjclark marked this conversation as resolved.
Show resolved Hide resolved
return string;
}

const segmenter = new Intl.Segmenter(undefined, {granularity: 'grapheme'});
const iterator = segmenter.segment(string)[Symbol.iterator]();

let lastSegment;
for (let i = 0; i <= characterLimit - ellipseSuffix.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically ellipseSuffix.length might not be grapheme length, but I guess we can assume we won't pass in suffixes without a 1:1 code point:grapheme ratio? :)

const result = iterator.next();
if (result.done) {
return string;
}

lastSegment = result.value;
}

if (!lastSegment) {
return ellipseSuffix;
}
connorjclark marked this conversation as resolved.
Show resolved Hide resolved

for (let i = 0; i < ellipseSuffix.length; i++) {
if (iterator.next().done) {
return string;
}
}

return string.slice(0, lastSegment.index) + ellipseSuffix;
}

/**
* @param {URL} parsedUrl
* @param {{numPathParts?: number, preserveQuery?: boolean, preserveHost?: boolean}=} options
Expand Down