Web-Structure-Mining

This project demonstrates the well-known PageRank and HITS algorithms.

Dependencies

The project depends on JSoup and GraphStream jars.

Mechanism

Bare-bones version of Breadth-First Search Crawler is implemented for crawling the web.

You can

assign seed urls to crawl from.
specify a .txt file containing keywords to filter which URLs to put in Frontier.
specify number of threads that the crawlers will be assigned and number of iterations for crawling.

The graph of (incoming, outgoing links) the Web is stored as HashMap<WebURL, HashMap<WebUrls, Integer>> internally (in a concurrent fashion.). PageRank and HITS algorihtms can be applied on this graphs. Graphs can be exported and imported later. It is possible to visualize a tiny part of the Web that contributes the most by specifying the number of nodes to be drawn based on their ranks.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
log		log
out		out
output		output
res		res
src		src
.gitignore		.gitignore
PageRank.iml		PageRank.iml
README.md		README.md
data		data
yildiz.txt		yildiz.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Structure-Mining

Dependencies

Mechanism

You can

GUI

Main Screen

Graph

About

Releases

Packages

Languages

azadyasar/Web-Structure-Mining

Folders and files

Latest commit

History

Repository files navigation

Web-Structure-Mining

Dependencies

Mechanism

You can

GUI

Main Screen

Graph

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages