Skip to content

How can node-scrapy be used recursively to crawl a site? #23

Answered by stefanmaric
mariusa asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @mariusa,

Even tho we use node-scrapy for crawling at Eeshi, it is focused on scraping part. It doesn't provide anything at the network layer and less so for the crawling logic.

For http fetching there's a plethora of options (request, got, axios, node-fetch, etc). Here a quick example I put together with node-fetch:

const fs = require('fs')
const path = require('path')

// need to be installed in the project
const fetch = require('node-fetch')
const { extract } = require('node-scrapy')

const wait = () =>
  new Promise((resolve) => {
    setTimeout(resolve, Math.round(Math.random() * 10000))
  })

const LINKS_STORE = {}
const START_URL = 'https://en.wikipedia.org/wiki/Printmaking'

const

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by mariusa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #23 on December 14, 2020 17:42.