Skip to content

Latest commit

 

History

History
36 lines (30 loc) · 1.1 KB

README.md

File metadata and controls

36 lines (30 loc) · 1.1 KB

Twitter_Scraper

NOTE: currently under development

Scrapes tweets from twitter.com and inserts into a SQL server database.
Uses Celery the asynchronous task queue as a framework.
Tested on Ubuntu 14.04 with pyhton 3.4

Install requirements

  • Python
  • Celery
    • pip install Celery
  • pymssql
    • sudo apt-get install freetds-dev freetds-bin
    • pip install pymssql
  • requests
  • lxml
    • sudo apt-get install python3-lxml
  • cssselect
    • pip install cssselect
  • RabbitMQ
    • sudo apt-get install rabbitmq-server

create a file keys.json file which contains the SQL server connection parameters

{
    "server":  "SERVER.database.windows.net",
    "user": "USER@SERVER",
    "password": "password",
    "database": "databasename"
}

note: Use the --recursive option when cloning to also clone the submodule