-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic site crawler to find 404s, etc. #321
Conversation
@donjo any idea what to do about these "less than IE9"-specific JS files that don't seem to exist? Do we even support < IE9 anymore? If not, I guess we can just get rid of the references... |
Hmm, @donjo just mentioned on slack that our documentation page on accessibility mentions that we only support IE9 and above. |
So I'm finding that a number of 404s are actually coming from external sources, like |
➕ to warnings for external source 404s. Agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and runs well!
config/crawl.js
Outdated
const isWarning = refs.every(path => WARNING_PAGES.includes(path)); | ||
const label = isWarning ? WARNING : ERROR; | ||
|
||
console.log(`${label}: 404 for ${item.path}!`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving off the !
would clean up the output a bit, and make it easier to copy the path
config/crawl.js
Outdated
|
||
const app = express(); | ||
|
||
app.use(express.static(`${__dirname}/../_site`)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check that _site
has contents, otherwise the call to refs.every(...)
later on (and potentially other things) will fail.
Here's output from running yarn crawl
without first running yarn build
:
/Users/jamesseppi/CODE/web-design-standards-docs/config/crawl.js:67
const isWarning = refs.every(path => WARNING_PAGES.includes(path));
^
TypeError: Cannot read property 'every' of undefined
at notFound.forEach.item (/Users/jamesseppi/CODE/web-design-standards-docs/config/crawl.js:67:31)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🚢
woooooooot! |
This is a work-in-progress attempt to fix #269, or at least part of it.
Instructions
Generate the site using
jekyll build
. The_site
folder should now contain the latest version of the site.Run
npm run crawl
. It will let you know if it found any errors.Notes
I decided to use
node-simplecrawler
because I've had experience using it in the past, and it seems reasonably fast and extensible.Unlike 18F/content-guide#132, I decided not to go with
html-proofer
because it's Ruby-based, and I don't have a ton of experience with Ruby. I also heard rumors that we might consider migrating this site from Jekyll to Hugo; if that happens, we would likely remove Ruby entirely from this project, so I figured node might be a safer long-term option. We can always switch, though!To do
Some of these can be filed as separate issues and dealt with in separate PRs.
All the JS scripts referenced byie-polyfill-scripts.html
.html5shiv.js
, which is in an IE-only comment inhead.html
.In the developer's guide, there's a broken link toCONTRIBUTING.md
.npm test
(and therefore during CI).