Skip to content

herbherbherb/Wiki_Crawler

Repository files navigation

Wiki Crawler with API

To install

It is recommanded to create a virtual environment first:
  • To create virenv:
    virtualenv venv
    source venv/bin/activate
  • Then install requirement:
    pip3 install -r requirements.txt

To Run:

There are 3 options:
  1. Run scraper to generate JSON
    python3 main.py scrap
  2. Use cached JSON to generate graph and query graph
    python3 main.py cache
  3. Use provided JSON to generate graph and query graph
    python3 main.py input
Graph Query:
  • Based on the prompt message, enter valid input alt text
  • For example:
    1. Find how much a "The Boy Next Door (film)" has grossed:
      => 1 The Boy Next Door (film)
    2. List which movies an "Morgan Freeman" has worked in:
      => 2 Morgan Freeman
    3. List which actors worked in movie "Dreamcatcher (2003 film)":
      => 3 Dreamcatcher (2003 film)
    4. List the top 50 actors with the most total grossing value
      => 4 50
    5. List the oldest 20 actors
      => 5 20
    6. List all the movies in year 2018:
      => 6 2018
    7. List all the actors in year 2016:
      => 7 2016
    8. Show list of movies:
      => 8
    9. Show list of actors:
      => 9
    10. Total number of movies:
      => 10
    11. Total Number of actors:
      => 11
    12. Show given movie info:
      => 12 The Verdict
    13. Show given actor info:
      => 13 Bruce Willis
    14. Identify 'hub' actors:
      => 14
    15. Calculate age and grossing value correlation and generate plot -> correlation.png:
      => 15
      • The correlation graph looks like the following: alt text
      • You can also visualize the graph structrue by rendering the graph_cache.svg on a browser, the graph looks like the following: alt text

To Confige:

  • Modify parameters in config.py
CLOSESPIDER_ITEMCOUNT = 30
wiki_start = "https://en.wikipedia.org/wiki/Morgan_Freeman"
PORT = 5001

To test:

Test Graph Structure

python3 graphy_test.py
  • There are total of 16 test cases to run, correct output looks like the following: alt text

To Run API:

python3 app.py

Test API

python3 test_app.py
  • To run the API test_app.py, first start running the serve:
python3 app.py
  • There are total of 10 test cases to run, correct output looks like the following: alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published