Skip to content

coding-spider/web-scrapper

Repository files navigation

My Application

The project is generated by LoopBack.

  • scrapper.js recursively scrapes urls
  • Urls are stored in Link collection, uniquenexx is applied on url to avoid duplicacy while inserting links
  • cheerio is used for dom window generation
  • Concurrency limit of 5 is maintained after 1st url Scrapping
  • Csv output is generated by batchwise processing of Link model data through streaming

Steps to run the project

  • clone the project directory
  • Inside directory run npm install
  • Then run node server/scripts/scrapper.js

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published