rcrawl

Recursive web-crawler.

To build:

go build -o bin/rcrawl cmd/rcrawl/main.go

To run

./bin/rcrawl --url=https://webscraper.io/test-sites/e-commerce/allinone --max_depth=3 --req_timeout_sec=5

Help

./bin/rcrawl --help

Most closely matches:

wget -r -np -l 3 -E -e robots=off https://webscraper.io/test-sites/e-commerce/allinone

Service description

This project presents a simple solution for an interview task and showcases common golang project structure and base crawling functionality. Additional features and functionality such as: Dockerfile, Makefile, Tests, support for replacing html links with their local alternatives can be added upon request.

Task description

implement recursive web-crawler of the site.
crawler is a command-line tool that accept starting URL and destination directory
crawler download the initial URL and look to links inside the original document (recursively)
crawler does not walk to link outside initial url (if starting link is https://start.url/abc, then it goes to https://start.url/abc/123 and https://start.url/abc/456, but skip https://another.domain/ and https://start.url/def)
crawler should correctly process Ctrl+C hotkey
crawler should be parallel
crawler should support continue to load if the destination directory already has loaded data (if we cancel the download and then continue).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cmd/rcrawl		cmd/rcrawl
internal/app		internal/app
pkg/signalhandler		pkg/signalhandler
.gitignore		.gitignore
.golangci.yml		.golangci.yml
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rcrawl

Recursive web-crawler.

To build:

To run

Help

Most closely matches:

Service description

Task description

About

Releases

Packages

Languages

tinbtb/rcrawl

Folders and files

Latest commit

History

Repository files navigation

rcrawl

Recursive web-crawler.

To build:

To run

Help

Most closely matches:

Service description

Task description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages