Skip to content

Basic module for crawling a domain in search of missing links.

License

Notifications You must be signed in to change notification settings

troby/linkcheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basic tool for crawling a domain in search of missing links.

Currently, this tool only saves 404 errors to a special list,
but enabling verbose will provide output of every response
status code. Linkcheck will verify external links, but it will
not scrape them for further links to be tested.

Dependencies:
python3.x
requests
BeautifulSoup4

Usage:

import linkcheck
check = linkcheck.SiteChecker('mysite.com')
check.start()
check.results()

The following defaults can be changed before invoking start():
check.verbose = False
check.delay = 3


Functional test:
The localhost site tested by functional_test.py should
have the following files:

download.html:
<html>
<body>
<a href="test.mp3">music file</a>
<a href="missing_2.html">missing</a>
</body>
</html>

index.html:
<html>
<body>
<a href="download.html">downloads</a>
<a href="missing_1.html">missing</a>
</body>
</html>

test.mp3:
binary file data

About

Basic module for crawling a domain in search of missing links.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages