Skip to content

azadyasar/Web-Structure-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web-Structure-Mining

This project demonstrates the well-known PageRank and HITS algorithms.

Dependencies

The project depends on JSoup and GraphStream jars.

Mechanism

Bare-bones version of Breadth-First Search Crawler is implemented for crawling the web.

You can

assign seed urls to crawl from.
specify a .txt file containing keywords to filter which URLs to put in Frontier.
specify number of threads that the crawlers will be assigned and number of iterations for crawling.

The graph of (incoming, outgoing links) the Web is stored as HashMap<WebURL, HashMap<WebUrls, Integer>> internally (in a concurrent fashion.). PageRank and HITS algorihtms can be applied on this graphs. Graphs can be exported and imported later. It is possible to visualize a tiny part of the Web that contributes the most by specifying the number of nodes to be drawn based on their ranks.

GUI

Main Screen

https://github.com/azadyasar/Web-Structure-Mining/blob/master/res/main_1.png

Graph

https://github.com/azadyasar/Web-Structure-Mining/blob/master/res/graph_1.png

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published