Analyzing curated tweets of opinion-shapers and newsmakers provided by nediyor.com and theplazz.com news sites to understand the dynamics of the responses of the elites to the important events in the US and in Turkey.
On the news that made to the headlines we collected about two years of curated tweets data for the United States (154,684 tweets of 1,442 commentators on 7,376 news between 01/09/2015 and 01/14/2013) and Turkey (190,180 tweets of 1306 commentators on 10,044 news between 01/09/2015 and 01/14/2013).
- Filenames starting with
scrape-
:- Selenium (as a Python API) is used to scrape the data from the main pages of the websites.
- Scrolled down 1000 times to overcome the lazy loading feature of the sites.
- To get individual comments, downloaded ~17,000 htmls from the links scraped from the main pages by
nohup sh -c "cat urls.txt | xargs -n 1 -P 10 wget " &
- The compressed files for nediyor(190MB) and theplazz(107MB) are on dropbox.
Aggregate-daily
&container.js
:- Counts of comments on news are aggregated by day and visualized
- Time series data is visualized using Highcarts JS.
commentators-stats.py
calculates and visualizes the following statistics:- Comment counts by commentator
- Group commentators by profession
- Monthly commentator performance
- ...
- Daily comment count visualization is here