Skip to content

Overview

Ivan Rudik edited this page May 7, 2019 · 1 revision

This wiki gives the workflow structure for joint projects. This largely follows from manuals written by Gentzkow and Shapiro, Alcott, and Kellogg. In some places we simply link to their manual or even quote them directly. This is stripped-down for the purpose of being accessible to a broader range of collaborators who cannot pay the fixed costs of using the full system. For example, this does not use SCons, as SCons requires additional coding and file maintenance that may not be necessary except for on larger projects. A shorter (and now outdated) overview of the Gentzkow-Shapiro RA manual is the PDF Code and Data for the Social Sciences: A Practitioner's Guide.

There are three core principles:

  1. Dynamic production. Everything in the entire project, from initial data to compiling the paper pdf, can be run from one main script. This means that all steps are fully automated, starting from loading in the first chunks of raw data to compiling the final paper. We should almost never have to hard code things in multiple spots if we make changes (e.g. if we make a change to how data is processed, the paper should reflect the updated changes automatically after we run the main script.)
  2. Open science. All code and data that can be made public will be made public. We will also aim to use open source software (R, Python, Julia) when possible.
  3. Unambiguous processes. We have a uniform set of rules for how we set up our folder structure, analyze data, etc. Task management for code (via GitHub Issues) is kept updated. We do not leave legacy files or uncompleted files in the folder.
Clone this wiki locally