Node TypeScript Multi-Purpose Headless-Chrome Scraper

It's a web scraper/crawler written in TypeScript that uses the Chrome devtools API with parallel headless Chrome instances to scrape websites using the real browser and not some approximation like many tools do.

It means it sees the full interface of pure JavaScript SPAs like any regular user would.

Currently the main reports this tool provides are internal URLs with error status codes, it's not a lot but it's already quite useful for the websites I work on.

Now that the base mechanism of discovering and going through all of the URLs of a website starting at a given domain is in place, I plan to add more feature as needed for my work.

Oh and the UI is a web one, written in React.

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
.vscode		.vscode
.yarn		.yarn
local-database		local-database
output		output
src		src
test		test
types/chrome-remote-interface		types/chrome-remote-interface
.editorconfig		.editorconfig
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.yarnrc.yml		.yarnrc.yml
README.md		README.md
babel.config.json		babel.config.json
jest.config.json		jest.config.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
webpack.config.ts		webpack.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Node TypeScript Multi-Purpose Headless-Chrome Scraper

About

Releases

Packages

Languages

djfm/ntsmphc-scraper

Folders and files

Latest commit

History

Repository files navigation

Node TypeScript Multi-Purpose Headless-Chrome Scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages