Skip to content

Architecture and workflow

Alberto Cottica edited this page Dec 28, 2015 · 1 revision

We need:

  • a script to download fresh data dumps from Wikipedia
  • server-side database maintenance and pre-processing
  • visualization built in the browser

###A script to download fresh data dumps from Wikipedia

About 100 MB an hour. This means watching your ISP contract and making sure they do not throttle your bandwidth.

###Server-side database maintenance and pre-processing

To a first approximation, the database has 30,765 records (one for each article in the WikiProject Medicine). Each hour, for each article, we extract the number of pageviews that occurred in the last hour and append it as a new column in the database. The dump is then deleted.

###Visualization

TODO

Clone this wiki locally