Skip to content

User perspective web crawler which uses the Chromium Debugging Protocol

License

Notifications You must be signed in to change notification settings

aau-network-security/kraaler

Repository files navigation

Kraaler

Build Status

This is an Go implementation of the design covered in /Kraaler: A User Perspective Web Crawler/ and presented at TMA 2019

Building

Kraaler requires CGO_ENABLED=1 (C-support in Go), due to the use of sqlite. In order to compile the binary a set of C libraries is needed. The official Golang Docker Images comes pre-bundled with these C dependencies, making them a convenient tool for compilation.

docker run \
	--rm \
	-v $(pwd):/go/src/github.com/aau-network-security/kraaler \
	-w /go/src/github.com/aau-network-security/kraaler/app/ \
	-e GO111MODULE=on \
	-e GOOS=linux \
	-e GOARCH=amd64 \
	-e CGO_ENABLED=1 \
	-e HOST_UID=`id -u` \
	golang:1.12.6 \
	bash build.sh

Remember to set GOOS and GOARCH according to your platform.

Running

$ krl run -n 3 \ # amount of workers
  --provider-file urls.txt \ # provider for urls
  --sampler 'uni' \ # sampler for prioritization of urls
  --filter-resp-bodies-ct '^text/' # only text bodies

Contributors

About

User perspective web crawler which uses the Chromium Debugging Protocol

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published