The goal of HSDS is to make all the data sets of the book “A Handbook of Small Data Sets” (1994) of David J. Hand available. These data sets are particularly useful to demonstrate examples of function or statistical tests, but also to teach about statistics and R.
All data sets are already available individually at this repo: https://github.com/JedStephens/Handbook-of-Small-Data-Sets/tree/master. However, they are not immediately usable in R, and undocumented. This package aims to solve this issue, and provide clean and documented data sets.
Do you like this package and want to support me ?
You can install the development version of HSDS like so:
devtools::install_github("ABohynDOE/HSDS")
The book contains more than 500 data sets. For the moment, only some are available. They are summarized in the table below, along with their names, what they contain, their structure, and the type of variables present.
name | Title | Structure | Variables |
---|---|---|---|
Germinating seeds |
48 × 3 |
factor(2), numeric(1) | |
Guessing lengths |
113 × 3 |
character(1), numeric(2) | |
Darwin’s cross-fertilized and self-fertilized plants |
30 × 3 |
factor(1), integer(1), numeric(1) | |
Intervals between cars on the M1 motorway |
41 × 2 |
character(2) | |
Tearing factor for paper |
20 × 2 |
numeric(2) | |
Abrasion loss |
30 × 3 |
numeric(3) | |
Mortality and water hardness |
61 × 5 |
factor(1), numeric(4) | |
Tensile strength of cement |
21 × 2 |
numeric(2) | |
Weight gain in rats |
40 × 3 |
factor(2), numeric(1) | |
Weight of chickens |
24 × 3 |
factor(2), numeric(1) | |
Flicker frequency |
27 × 4 |
factor(3), numeric(1) | |
Effect of ammonium chloride on yield |
32 × 5 |
factor(4), numeric(1) |
This is a basic example which shows you how to use a data set to make a nice plot:
library(HSDS)
library(ggplot2)
ggplot(germin, aes(x = water, y = seeds, color = box)) +
geom_boxplot(na.rm = T) +
theme_bw()
We are far from the 500 data sets, so any help is welcome ! If you want
to contribute, all raw data sets are already present in the repo (at
data-raw/data-files
), so feel free to clean one or more… ! If you do
so, please respect the following guidelines: