Working with ICU datasets, especially with publicly available ones as
provided by PhysioNet in R is facilitated by
ricu
, which provides data access, a level of abstraction to encode
clinical concepts in a data source agnostic way, as well as classes and
utilities for working with the arising types of time series datasets.
To cite ricu
, please use the following:
@article{bennett2023ricu,
title={ricu: R’s interface to intensive care data},
author={Bennett, Nicolas and Ple{\v{c}}ko, Drago and Ukor, Ida-Fong and Meinshausen, Nicolai and B{\"u}hlmann, Peter},
journal={GigaScience},
volume={12},
pages={giad041},
year={2023},
publisher={Oxford University Press}
}
Currently, installation is only possible from github directly, using the
remotes
if installed
remotes::install_github("eth-mds/ricu")
or by sourcing the required code for installation from github by running
rem <- source(
paste0("https://raw.githubusercontent.com/r-lib/remotes/main/",
"install-github.R")
)
rem$value("eth-mds/ricu")
In order to make sure that some useful utility packages are installed as
well, consider installing the packages marked as Suggests
as well by
running
remotes::install_github("eth-mds/ricu", dependencies = TRUE)
instead, or by installing some of the utility packages (relevant for downloading and preprocessing PhysioNet datasets)
install.packages("xml2")
and demo dataset packages
install.packages(c("mimic.demo", "eicu.demo"),
repos = "https://eth-mds.github.io/physionet-demo")
explicitly.
Out of the box (provided the two data packages mimic.demo
and
eicu.demo
are available), ricu
provides access to the demo datasets
corresponding to the PhysioNet Clinical Databases eICU and MIMIC-III.
Tables are available as
mimic_demo$admissions
#> # <mimic_tbl>: [129 ✖ 19]
#> # ID options: subject_id (patient) < hadm_id (hadm) < icustay_id (icustay)
#> # Defaults: `admission_type` (val)
#> # Time vars: `admittime`, `dischtime`, `deathtime`, `edregtime`, `edouttime`
#> row_id subject_id hadm_id admittime dischtime
#> <int> <int> <int> <dttm> <dttm>
#> 1 12258 10006 142345 2164-10-23 21:09:00 2164-11-01 17:15:00
#> 2 12263 10011 105331 2126-08-14 22:32:00 2126-08-28 18:59:00
#> 3 12265 10013 165520 2125-10-04 23:36:00 2125-10-07 15:13:00
#> 4 12269 10017 199207 2149-05-26 17:19:00 2149-06-03 18:42:00
#> 5 12270 10019 177759 2163-05-14 20:43:00 2163-05-15 12:00:00
#> …
#> 125 41055 44083 198330 2112-05-28 15:45:00 2112-06-07 16:50:00
#> 126 41070 44154 174245 2178-05-14 20:29:00 2178-05-15 09:45:00
#> 127 41087 44212 163189 2123-11-24 14:14:00 2123-12-30 14:31:00
#> 128 41090 44222 192189 2180-07-19 06:55:00 2180-07-20 13:00:00
#> 129 41092 44228 103379 2170-12-15 03:14:00 2170-12-24 18:00:00
#> # ℹ 124 more rows
#> # ℹ 14 more variables: deathtime <dttm>, admission_type <chr>,
#> # admission_location <chr>, discharge_location <chr>, insurance <chr>,
#> # language <chr>, religion <chr>, marital_status <chr>, ethnicity <chr>,
#> # edregtime <dttm>, edouttime <dttm>, diagnosis <chr>,
#> # hospital_expire_flag <int>, has_chartevents_data <int>
and data can be loaded into an R session for example using
load_ts("labevents", "mimic_demo", itemid == 50862L,
cols = c("valuenum", "valueuom"))
#> # A `ts_tbl`: 299 ✖ 4
#> # Id var: `icustay_id`
#> # Index var: `charttime` (1 hours)
#> icustay_id charttime valuenum valueuom
#> <int> <drtn> <dbl> <chr>
#> 1 201006 0 hours 2.4 g/dL
#> 2 203766 -18 hours 2 g/dL
#> 3 203766 4 hours 1.7 g/dL
#> 4 204132 7 hours 3.6 g/dL
#> 5 204201 9 hours 2.3 g/dL
#> …
#> 295 298685 130 hours 1.9 g/dL
#> 296 298685 154 hours 2 g/dL
#> 297 298685 203 hours 2 g/dL
#> 298 298685 272 hours 2.2 g/dL
#> 299 298685 299 hours 2.5 g/dL
#> # ℹ 294 more rows
which returns time series data as ts_tbl
object.
This work was supported by grant #2017-110 of the Strategic Focal Area “Personalized Health and Related Technologies (PHRT)” of the ETH Domain for the SPHN/PHRT Driver Project “Personalized Swiss Sepsis Study”.