-
Notifications
You must be signed in to change notification settings - Fork 11
code_and_data
Making your code, data, and figures easily accessible is central to the concept of reproducible research in the modern era. To make this less painful, I have included three data structures, data_files.yaml
, code.yaml
, and figures.yaml
. The data encoded in these files are populated in page02_code_data.md
.
All of your code and "small" data should be stored on this repository or, if you are building the website in your research repository, located on the gh-pages
branch. In this template, there are two specific folders for the code scripts and data files. These folders are named software
, and datasets
respectively.
This is a simple yaml
data structure which contains an arbitrary number of script
fields which have associated properties. An example of a script
field is given below.
- script:
name: script1.R # The file name of the script
desc: >
A one or two sentence description of what the code actually does.
The name
field is the script file name that is stored in the software/
subfolder of the root directory.
This file is similar to code.yaml
as it encodes dataset
fields which have associated properties. As datasets can sometimes be very big, this structure requires a definition of whether the data is stored locally (in the repository) or hosted on a cloud storage service like DataDryad, CaltechDATA, or Zenodo. An example of a dataset
field is shown below for a data set that is stored remotely.
- dataset:
storage: "remote"
name: "A short description of the data set."
filetype: "The type of file (for example, csv, hdf5, or tiff)"
filesize: 24 MB # An approximate data size
link: "https://data.caltech.edu/" # Link to the remote storage OR the filename of the local file.
DOI: "10.1.1/journal.0000" # DOI of the remotely stored data. If not remote, this field is ignored.
If the data set is stored remotely, the link
field should be a URL to the precise data set. As these URLs can change over time, you should also provide a DOI
. If the data set is stored locally, the DOI field will be ignored and the filename of the data set should be in the link:
field.
Every figure in the main text of your paper and/or supplementary information that has data that is presented should be easily reproducible. If you are following the reproducible research repository, you will be generating a single script for every figure that is produced. The figures.yaml
file encodes information about each figure including the datasets and script needed to reproduce the figure. An example fig
field is given below.
- fig:
title: "A descriptive title of the figure, including the figure number."
filename: fig1.py # The file name of the code used to generate the figure.
desc: "A one or two sentence description of the figure."
pic: "A thumbnail image of the figure to be reproduced.
req: # Begins the set of required data sets
- ds:
storage: "local"
title: "Title of the data set"
link: dataset1.csv
- ds:
.
.
.
This is the most complicated data structure defined in this template and includes a few nested field. The first field fig
defines a figure to be presented as its own object on the website. The filename
is the filename of the code used to generate the figure. The desc
field describes the figure in broad strokes and should be enough to jog someone's memory about what the figure was presenting. The pic
field is a thumbnail image of the figure. This is the filename of the thumbnail in the assets/img/
subfolder in the root directory.
The req
field denotes a series of required data sets needed to reproduce the figure. Beneath req
, we define a new required data set field with - ds:
. Beneath this field, we have information about that particular data set. The field storage
defines whether the link should point towards a locally stored data set or if the link should point to external storage. Finally, the link:
field is the filename of or the link to the required data set.