-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc: share markdown parsing in collect-strings and the report #9514
Conversation
(a bit easier to see at https://github.com/GoogleChrome/lighthouse/pull/9514/files?w=1, most added lines are the expanded tests) |
const parts = text.split(/\[([^\]]+?)\]\((https?:\/\/.*?)\)/g); | ||
while (parts.length) { | ||
// Pop off the same number of elements as there are capture groups. | ||
const [preambleText, linkText, linkHref] = parts.splice(0, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried rewriting this with i % 3
and have 0
and 1
cases, where the 1
case also pops off parts[i + 1]
, but it just got harder to follow, so stuck with this internally
/* global URL */ | ||
|
||
describe('util helpers', () => { | ||
let origConsoleWarn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (message.match(/\[.*\] \(.*\)/)) { | ||
throw Error(`Bad Link spacing in message "${message}"`); | ||
} | ||
// * [](empty link text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@exterkamp and I discussed this empty linkText case and decided while maybe there's some vague possibility of a need for invisible links at some point, it's almost certainly an accident today, so let's alert the author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
const parts = text.split(/\[([^\]]+?)\]\((https?:\/\/.*?)\)/g); | ||
while (parts.length) { | ||
// Pop off the same number of elements as there are capture groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Pop off the same number of elements as there are capture groups. | |
// Shift off the same number of elements as there are capture groups. |
;)
largely a pedantic suggestion and pre-existing, so can reject if you'd like
// Pop off the same number of elements as there are capture groups. | ||
const [preambleText, linkText, linkHref] = parts.splice(0, 3); | ||
|
||
if (preambleText) { // Empty plain text is an artifact of splitting, not meaningful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the empty explanation feels a bit weird here, maybe invert it or at least say "We can skip empty b/c ..."?
]); | ||
}); | ||
|
||
it('splits on backticked code at the emd of the string', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it('splits on backticked code at the emd of the string', () => { | |
it('splits on backticked code at the end of the string', () => { |
}); | ||
}); | ||
|
||
describe('#splitMarkdownLink', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love these ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Like the collection of concerns into 1 file and all the new tests!
* into segments that were enclosed in backticks (marked as `isCode === true`) | ||
* and those that outside the backticks (`isCode === false`). | ||
* @param {string} text | ||
* @return {Array<{isCode: true, codeText: string}|{isCode: false, plainText: string}>} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to me to include a boolean flag and use different text fields. Makes more sense to me to have {isCode: boolean, text: string}
? Is this less canonical for js?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a massive discussion about this very thing when discussing the shape of the proto and the hardened LHR API.
I 100% agree this feels anti-JS :)
OTOH, it's for internal use only, never goes over any wire, and we never really need to use just the text, so I don't have a strong reason to object yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha everyone lives with "preambleText" (does code typically come with a preamble?) including preambleText that occurs after the content and may not have any text in it, but I try to make self describing property names and everyone loses their minds :P
I think there is value in API safety like this (in this case tsc requires the isCode
discriminator to be checked for the consuming code to be able to know if it can use plainText
or codeText
) but as you say it's internal only, for a minor feature, and I was really just playing around with the interface (not sure I really like it), so I can switch back to text
or whatever :)
* `isLink === false`), and segments with text content and a URL that did make | ||
* up a link (marked as `isLink === true`). | ||
* @param {string} text | ||
* @return {Array<{isLink: true, linkText: string, linkHref: string}|{isLink: false, plainText: string}>} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing about this. Seems like isLink
with text
would be sufficient.
]); | ||
}); | ||
|
||
it('handles text only within backticks', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also curious for backticks separated by only a space if that would maintain the space e.g.
"`first code` `second code`"
...and that works as well. Feel free to add it as a test if that seems significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to add it as a test if that seems significant.
sure, no problem
😆🤣😆🤣😆🤣 |
duplicated markdown parsing in
dom.js
for the report andcollect-strings.js
for adding placeholders for ctc files (which will eventually be reconstituted and then passed throughdom.js
methods) is a little risky, so the first commit here makes them share a splitting implementation inutil.js
.There are no changes to the collected locale strings or the existing
dom
andcollect-strings
unit tests....but then I couldn't resist making an interface that's easier to understand (no more mysterious empty string for
preambleText
butundefined
forlinkText
andlinkHref
). This is isolated to the second commit, so if it seems excessive I can drop it :)