Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(jsonld): Structured data validation updates #8137

Merged
merged 18 commits into from
Apr 16, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions lighthouse-core/lib/sd-validation/json-linter.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,21 @@ const jsonlint = require('jsonlint-mod');

/**
* @param {string} input
* @returns {{message: string, lineNumber: string|null}|null}
* @returns {{message: string, lineNumber: number|null}|null}
*/
module.exports = function parseJSON(input) {
try {
jsonlint.parse(input);
} catch (error) {
/** @type {string|null} */
/** @type {number|null} */
let line = error.at;
mattzeunert marked this conversation as resolved.
Show resolved Hide resolved

// extract line number from message
if (!line) {
const regexLineResult = error.message.match(/Parse error on line (\d+)/);

if (regexLineResult) {
line = regexLineResult[1];
line = parseFloat(regexLineResult[1]);
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ module.exports = function validateJsonLD(json) {
if (name.startsWith('@') && !VALID_KEYWORDS.has(name)) {
errors.push({
path: path.join('/'),
message: 'Unknown keyword',
message: 'Unknown keyword ' + name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this in quotes or after a colon or something? this also makes we wonder what our i18n story is here 😬

});
}
});
Expand Down
35 changes: 23 additions & 12 deletions lighthouse-core/lib/sd-validation/schema-validator.js
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ function findType(type) {
*
* @param {string|Array<string>} typeOrTypes
* @param {Array<string>} keys
* @returns {Array<string>}
* @returns {Array<{message: string, key?: string, invalidTypes?: Array<string>, path?: string}>}
*/
function validateObjectKeys(typeOrTypes, keys) {
/** @type {Array<string>} */
Expand All @@ -60,17 +60,22 @@ function validateObjectKeys(typeOrTypes, keys) {
} else if (Array.isArray(typeOrTypes)) {
types = typeOrTypes;
const invalidIndex = typeOrTypes.findIndex(s => typeof s !== 'string');
if (invalidIndex >= 0) return [`Unknown value type at index ${invalidIndex}`];
if (invalidIndex >= 0) {
return [{message: `Unknown value type at index ${invalidIndex}`}];
}
} else {
return ['Unknown value type'];
return [{message: 'Unknown value type'}];
}

const unknownTypes = types.filter(t => !findType(t));

if (unknownTypes.length) {
return unknownTypes
.filter(type => SCHEMA_ORG_URL_REGEX.test(type))
.map(type => `Unrecognized schema.org type ${type}`);
.map(type => ({
message: `Unrecognized schema.org type ${type}`,
key: '@type',
}));
}

/** @type {Set<string>} */
Expand All @@ -91,15 +96,19 @@ function validateObjectKeys(typeOrTypes, keys) {
// remove Schema.org input/output constraints http://schema.org/docs/actions.html#part-4
.map(key => key.replace(/-(input|output)$/, ''))
.filter(key => !allKnownProps.has(key))
.map(key => `Unexpected property "${key}"`);
.map(key => ({
message: `Unexpected property "${key}"`,
key,
invalidTypes: types,
}));
}

/**
* @param {LH.StructuredData.ExpandedSchemaRepresentation|null} expandedObj Valid JSON-LD object in expanded form
* @return {Array<{path: string, message: string}>}
* @return {Array<{path?: string, message: string, invalidTypes?: Array<string>}>}
*/
module.exports = function validateSchemaOrg(expandedObj) {
/** @type {Array<{path: string, message: string}>} */
/** @type {Array<{path?: string, message: string, invalidTypes?: Array<string>}>} */
const errors = [];

if (expandedObj === null) {
Expand All @@ -114,20 +123,22 @@ module.exports = function validateSchemaOrg(expandedObj) {

walkObject(expandedObj, (name, value, path, obj) => {
if (name === TYPE_KEYWORD) {
const keyErrorMessages = validateObjectKeys(value, Object.keys(obj));
const keyErrors = validateObjectKeys(value, Object.keys(obj));

keyErrorMessages.forEach(message =>
keyErrors.forEach(error => {
errors.push({
invalidTypes: error.invalidTypes,
message: error.message,
// get rid of the first chunk (/@type) as it's the same for all errors
path:
'/' +
path
.slice(0, -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait a second, the path is backwards? I think we're missing some tests that have paths of length >2 then 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into this more tomorrow, but what makes you think the path is backwards? Here's an example of the path:

[ 'http://schema.org/author', '0', 'http://schema.org/colleague', '0', '@type' ]

For this:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "author": {
    "@type": "Person",
    "displayNaame": "Sally",
    "colleague": {
      "@type": "Person",
      "displayNaame": "Sally"
    }
  }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the comment confused me. It says it's getting rid of the "first chunk" but it's removing the last chunk. That made me think it is backwards and all paths started with /@type, but I see now :)

maybe update the comment? a test for longer paths still couldn't hurt though sometime :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, yeah good catch, that's a very confusing comment! Added a test case for deeper nesting too.

.concat(error.key || [])
.map(cleanName)
.join('/'),
message,
})
);
});
});
}
});

Expand Down
77 changes: 68 additions & 9 deletions lighthouse-core/lib/sd-validation/sd-validation.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,20 @@ const validateJsonLD = require('./jsonld-keyword-validator.js');
const expandAsync = require('./json-expander.js');
const validateSchemaOrg = require('./schema-validator.js');

/** @typedef {'json'|'json-ld'|'json-ld-expand'|'schema-org'} ValidatorType */

/**
* Validates JSON-LD input. Returns array of error objects.
*
* @param {string} textInput
* @returns {Promise<Array<{path: ?string, validator: ValidatorType, message: string}>>}
* @returns {Promise<Array<LH.StructuredData.ValidationError>>}
*/
module.exports = async function validate(textInput) {
// STEP 1: VALIDATE JSON
const parseError = parseJSON(textInput);

if (parseError) {
return [{
validator: 'json',
path: parseError.lineNumber,
validator: /** @type {LH.StructuredData.ValidatorType} */ ('json'),
lineNumber: parseError.lineNumber,
message: parseError.message,
}];
}
Expand All @@ -38,9 +36,10 @@ module.exports = async function validate(textInput) {
if (jsonLdErrors.length) {
return jsonLdErrors.map(error => {
return {
validator: /** @type {ValidatorType} */ ('json-ld'),
validator: /** @type {LH.StructuredData.ValidatorType} */ ('json-ld'),
path: error.path,
message: error.message,
lineNumber: getLineNumberFromJsonLDPath(inputObject, error.path),
};
});
}
Expand All @@ -52,8 +51,7 @@ module.exports = async function validate(textInput) {
expandedObj = await expandAsync(inputObject);
} catch (error) {
return [{
validator: 'json-ld-expand',
path: null,
validator: /** @type {LH.StructuredData.ValidatorType} */ ('json-ld-expand'),
message: error.message,
}];
}
Expand All @@ -64,12 +62,73 @@ module.exports = async function validate(textInput) {
if (schemaOrgErrors.length) {
return schemaOrgErrors.map(error => {
return {
validator: /** @type {ValidatorType} */ ('schema-org'),
validator: /** @type {LH.StructuredData.ValidatorType} */ ('schema-org'),
path: error.path,
message: error.message,
lineNumber: error.path ? getLineNumberFromJsonLDPath(inputObject, error.path) : null,
invalidTypes: error.invalidTypes,
};
});
}

return [];
};

/**
* @param {*} obj
* @param {string} path
* @returns null | number - line number of the path value in the prettified JSON
mattzeunert marked this conversation as resolved.
Show resolved Hide resolved
*/
function getLineNumberFromJsonLDPath(obj, path) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two are utility-y enough, maybe they're worth exposing and having explicit tests for them?

They're getting test coverage right now but I doubt it's much, and it would have the benefit of documenting them a bit more. e.g. these aren't called with literals anywhere, so it's not possible to know what the path string generally looks like without also looking at the code that generates paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, adding some tests is a good idea 👍

// To avoid having an extra dependency on a JSON parser we set a unique key in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has that shipped sailed with jsonlint-mod is there any way to leverage that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also deja vu, I feel like I've reviewed this before so sorry if you've answered this in the previous PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has that shipped sailed with jsonlint-mod is there any way to leverage that?

I don't see a direct way to leverage jsonlint-mod here. It just uses Douglas Crockford's JSON parser.

I guess we could parse it ourselves or find a small parser that we can include?

also deja vu, I feel like I've reviewed this before so sorry if you've answered this in the previous PR

Yup, I sent you a pre-PR to get some initial feedback.

// object and then use that to identify the correct line
const searchKey = Math.random().toString();
obj = JSON.parse(JSON.stringify(obj));

setValueAtJsonLDPath(obj, path, searchKey);
const jsonLines = JSON.stringify(obj, null, 2).split('\n');
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
const lineIndex = jsonLines.findIndex(line => line.includes(searchKey));

return lineIndex === -1 ? null : lineIndex + 1;
}

/**
* @param {*} obj
* @param {string} path
* @param {*} value
*/
function setValueAtJsonLDPath(obj, path, value) {
const pathParts = path.split('/').filter(p => !!p);
let currentObj = obj;
pathParts.forEach((pathPart, i) => {
if (pathPart === '0' && !Array.isArray(currentObj)) {
// jsonld expansion turns single values into arrays
return;
}

const isLastPart = pathParts.length - 1 === i;
let keyFound = false;
for (const key of Object.keys(currentObj)) {
// The actual key in JSON might be an absolute IRI like "http://schema.org/author"
// but key provided by validator is "author"
const keyParts = key.split('/');
const relativeKey = keyParts[keyParts.length - 1];
if (relativeKey === pathPart && currentObj[key] !== undefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like if we have currentObj[key] as undefined we could just return before this whole loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I don't see why we'd need it at all actually...

// If we've arrived at the end of the provided path set the value, otherwise
// continue iterating with the object at the key location
if (isLastPart) {
currentObj[key] = value;
} else {
currentObj = currentObj[key];
}
keyFound = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like keyFound is unnecessary, we could just return here and throw if we ever make it through the loop without returning

return;
}
}

if (!keyFound) {
// Couldn't find the key we got from validation in the original object
throw Error('Key not found: ' + pathPart);
}
});
}
40 changes: 37 additions & 3 deletions lighthouse-core/test/lib/sd-validation-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ describe('JSON validation', () => {
`);

assert.equal(errors.length, 1);
assert.equal(errors[0].path, 2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably use strict equal here then, these would have been failing before I guess

assert.equal(errors[0].lineNumber, 2);
assert.ok(errors[0].message.indexOf(`Expecting '}'`) === 0);
});

Expand All @@ -28,7 +28,7 @@ describe('JSON validation', () => {
}`);

assert.equal(errors.length, 1);
assert.equal(errors[0].path, 2);
assert.equal(errors[0].lineNumber, 2);
assert.ok(errors[0].message.indexOf(`Expecting 'EOF', '}', ':', ',', ']'`) === 0);
});

Expand Down Expand Up @@ -70,8 +70,9 @@ describe('JSON-LD validation', () => {
}`);

assert.equal(errors.length, 1);
assert.equal(errors[0].message, 'Unknown keyword');
assert.equal(errors[0].message, 'Unknown keyword @test');
assert.equal(errors[0].path, '@test');
assert.equal(errors[0].lineNumber, 4);
});

it('reports invalid context', async () => {
Expand Down Expand Up @@ -125,6 +126,7 @@ describe('schema.org validation', () => {

assert.equal(errors.length, 1);
assert.equal(errors[0].message, 'Unrecognized schema.org type http://schema.org/Cat');
assert.equal(errors[0].lineNumber, 3);
});

it('handles arrays of json schemas', async () => {
Expand Down Expand Up @@ -169,7 +171,9 @@ describe('schema.org validation', () => {
}`);

assert.equal(errors.length, 1);
assert.equal(errors[0].invalidTypes[0], 'http://schema.org/Article');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name is a bit confusing to me for how it's used. the type isn't invalid, right? it's just the type of the entity that has some other invalid stuff going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity is not a valid type instance because it has a property that doesn't exist on type.

I originally just called that array types, but then it wasn't clear that having that array means invalid. (And we use that array to generate the Invalid Event: unexpected property asdf message.)

Maybe we can call the array invalidForTypes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh gotcha, I actually like the direction you were headed with .types then. I think it's clear something is invalid by virtue of the fact that we are giving an error. Maybe typeOfInvalidEntity or something similar if we're going the more explicit route?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm at validTypes right now, since we only include the types in the validation error if they are valid. (And we don't want to show invalid types in the UI, because in that case they'll be part of the error message.)

assert.equal(errors[0].message, 'Unexpected property "controversial"');
assert.equal(errors[0].lineNumber, 11);
});

it('passes if non-schema.org context', async () => {
Expand Down Expand Up @@ -206,4 +210,34 @@ describe('schema.org validation', () => {

assert.equal(errors.length, 0);
});

it('passes if valid json-ld uses absolute IRIs as keys', async () => {
const errors = await validateJSONLD(`{
"@type": "http://schema.org/Article",
"http://schema.org/author": {
"@type": "Person",
"http://schema.org/name": "Cat"
},
"http://schema.org/datePublished": "Oct 29th 2017",
"http://schema.org/dateModified": "Oct 29th 2017"
}`);

assert.equal(errors.length, 0);
});

it('fails if invalid json-ld uses absolute IRIs as keys', async () => {
const errors = await validateJSONLD(`{
"@type": "http://schema.org/Article",
"http://schema.org/author": {
"@type": "http://schema.org/Person",
"http://schema.org/invalidProperty": "",
"http://schema.org/name": "Cat"
},
"http://schema.org/datePublished": "Oct 29th 2017",
"http://schema.org/dateModified": "Oct 29th 2017"
}`);

assert.equal(errors.length, 1);
assert.equal(errors[0].lineNumber, 5);
});
});
15 changes: 15 additions & 0 deletions types/structured-data.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,21 @@
declare global {
module LH {
module StructuredData {

export type ValidatorType =
| "json"
| "json-ld"
| "json-ld-expand"
| "schema-org";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, where is this leading | coming from :P Not even prettier does this, I don't think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my prettier definitely does this in other projects, not sure about here :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my prettier definitely does this in other projects, not sure about here :)

maybe it's a typescript prettier thing? I couldn't get it to repro in their playground


export interface ValidationError {
message: string;
path?: string;
validator: ValidatorType;
lineNumber?: number | null;
invalidTypes?: Array<string>;
}

export interface ExpandedSchemaRepresentationItem {
[schemaRef: string]: Array<
string |
Expand Down