NOTE: currently under development
Scrapes tweets from twitter.com and inserts into a SQL server database.
Uses Celery the asynchronous task queue as a framework.
Tested on Ubuntu 14.04 with pyhton 3.4
- Python
- Celery
pip install Celery
- pymssql
sudo apt-get install freetds-dev freetds-bin
pip install pymssql
- requests
- lxml
sudo apt-get install python3-lxml
- cssselect
pip install cssselect
- RabbitMQ
sudo apt-get install rabbitmq-server
create a file keys.json file which contains the SQL server connection parameters
{
"server": "SERVER.database.windows.net",
"user": "USER@SERVER",
"password": "password",
"database": "databasename"
}
note: Use the --recursive
option when cloning to also clone the submodule