Skip to content

WIP - Node TypeScript Multi-Purpose Headless-Chrome Scraper

Notifications You must be signed in to change notification settings

djfm/ntsmphc-scraper

Repository files navigation

Node TypeScript Multi-Purpose Headless-Chrome Scraper

It's a web scraper/crawler written in TypeScript that uses the Chrome devtools API with parallel headless Chrome instances to scrape websites using the real browser and not some approximation like many tools do.

It means it sees the full interface of pure JavaScript SPAs like any regular user would.

Currently the main reports this tool provides are internal URLs with error status codes, it's not a lot but it's already quite useful for the websites I work on.

Now that the base mechanism of discovering and going through all of the URLs of a website starting at a given domain is in place, I plan to add more feature as needed for my work.

Oh and the UI is a web one, written in React.

About

WIP - Node TypeScript Multi-Purpose Headless-Chrome Scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published