A "snake_case" filter system to R
.
if (!requireNamespace("remotes")) {
install.packages("remotes")
}
remotes::install_github(
repo = "openpharma/filters",
upgrade = "never"
)
library(filters)
library(magrittr)
library(random.cdisc.data)
library(rtables)
library(tern)
set.seed(1)
adsl <- radsl()
adae <- radae(adsl)
vads <- list(adsl = adsl, adae = adae)
{filters}
comes with a built-in filter library. You can list them using list_all_filters()
.
list_all_filters()
# A tibble: 272 x 4
id title target condition
<chr> <chr> <chr> <chr>
1 COV Confirmed/Suspected COVID… ADAE ACOVFL == 'Y'
2 COVAS AEs Associated with COVID… ADAE ACOVASFL == 'Y'
3 CTC35 Grade 3-5 Adverse Events ADAE ATOXGR %in% c('3', '4', '5')
4 DSC Adverse Events Leading to… ADAE AEACN == 'DRUG WITHDRAWN'
5 DSM Adverse Events Leading to… ADAE AEACN %in% c('DOSE INCREASED',…
6 FATAL Fatal Adverse Events ADAE AESDTH == 'Y'
7 NCOV Excluding Confirmed/Suspe… ADAE ACOVFL != 'Y'
8 NCOVAS AEs not Associated with C… ADAE ACOVASFL != 'Y'
9 NFATAL Non-fatal Adverse Events ADAE AESDTH == 'N'
10 NREL Adverse Events not Relate… ADAE AREL == 'N'
# … with 262 more rows
To add a new filter use add_filter()
. The last argument, condition
,
defines the condition to use to filter the datasets later on. It will be
passed to subset()
when calling apply_filter()
.
add_filter(
id = "CTC34",
title = "Grade 3-4 Adverse Events",
target = "ADAE",
condition = AETOXGR %in% c("4", "5")
)
Alternatively, you can use load_filters()
to load filter definitions
from a yaml file. The file should be structured like this:
CTC4:
title: Grade 4 Adverse Events
target: ADAE
condition: ATOXGR == "4"
TP53WT:
title: TP53 Wild Type
target: ADSL
condition: TP53 == "WILD TYPE"
file_path <- system.file("filters_eg.yaml", package = "filters")
load_filters(file_path)
You can confirm that filters haven been successfully added by using
get_filter()
.
get_filter("CTC34")
$title
[1] "Grade 3-4 Adverse Events"
$target
[1] "ADAE"
$condition
AETOXGR %in% c("4", "5")
If you ask for a non-existing filter get_filter()
will throw an error.
get_filter("GIDIS")
Error: Filter 'GIDIS' does not exist.
To overwrite an existing filter you will have to set overwrite = TRUE
.
Otherwise an error is thrown.
add_filter(
id = "FATAL",
title = "Fatal Adverse Events",
target = "ADAE",
condition = ATOXGR == "5"
)
Error: Filter 'FATAL' already exists. Set `overwrite = TRUE` to force overwriting the existing filter definition.
add_filter(
id = "FATAL",
title = "Fatal Adverse Events",
target = "ADAE",
condition = ATOXGR == "5",
overwrite = TRUE
)
You can use apply_filter()
to filter a single dataset or a list
of
multiple
datasets.
adsl_se <- apply_filter(adsl, "SE")
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
adae_ctc34_ser <- apply_filter(adae, "CTC34_SER")
Filters 'CTC34', 'SER' matched target ADAE.
216/1967 records matched the filter condition `AETOXGR %in% c('4', '5') & AESER == 'Y'`.
filtered_datasets <- apply_filter(vads, "CTC34_SER_SE")
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Filters 'CTC34', 'SER' matched target ADAE.
216/1967 records matched the filter condition `AETOXGR %in% c('4', '5') & AESER == 'Y'`.
As you can see apply_filter()
gives you feedback on which IDs matched
the dataset. This matching is done by the name of the input dataset. It
does not matter whether the dataset name is in upper or lower case or a
mix of both.
ADSL <- adsl
adsl_it <- apply_filter(ADSL, "IT")
Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.
In case your dataset is not named in a standard way you can manually
tell apply_filter()
which dataset it is by setting the target
argument.
sl <- adsl
sl_it1 <- apply_filter(sl, "IT")
No filter matched target SL.
sl_it2 <- apply_filter(sl, "IT", target = "ADSL")
Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.
{filters}
package works well with {rtables}
and {tern}
packages. See the
following example of creating a table by a function:
t_ae <- function(datasets) {
anl <- merge(
x = datasets$adsl,
y = datasets$adae,
by = c("STUDYID", "USUBJID"),
all = FALSE, # inner join
suffixes = c("", "_ADAE")
)
split_fun <- drop_split_levels
lyt <- basic_table(show_colcounts = TRUE) %>%
split_cols_by(var = "ARM") %>%
add_overall_col(label = "All Patients") %>%
analyze_num_patients(
vars = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one adverse event",
nonunique = "Overall total number of events"
)
) %>%
split_rows_by(
"AEBODSYS",
child_labels = "visible",
nested = FALSE,
split_fun = split_fun,
label_pos = "topleft",
split_label = obj_label(adae$AEBODSYS)
) %>%
summarize_num_patients(
var = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one adverse event",
nonunique = "Total number of events"
)
) %>%
count_occurrences(
vars = "AEDECOD",
.indent_mods = -1L
) %>%
append_varlabels(adae, "AEDECOD", indent = 1L)
result <- build_table(
lyt,
df = datasets$adae,
alt_counts_df = datasets$adsl
)
return(result)
}
You can easily create multiple outputs with this function by applying
the filters to the input datasets before passing them to
t_ae()
.
vads %>% apply_filter("SE") %>% t_ae()
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Body System or Organ Class A: Drug X B: Placebo C: Combination All Patients
Dictionary-Derived Term (N=133) (N=141) (N=126) (N=400)
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Total number of patients with at least one adverse event 111 (83.5%) 132 (93.6%) 119 (94.4%) 362 (90.5%)
Overall total number of events 636 755 655 2046
cl A.1
Total number of patients with at least one adverse event 63 (47.4%) 79 (56.0%) 71 (56.3%) 213 (53.2%)
Total number of events 123 144 133 400
dcd A.1.1.1.1 47 (35.3%) 63 (44.7%) 50 (39.7%) 160 (40.0%)
dcd A.1.1.1.2 42 (31.6%) 47 (33.3%) 44 (34.9%) 133 (33.2%)
cl B.1
Total number of patients with at least one adverse event 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
Total number of events 73 63 75 211
dcd B.1.1.1.1 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
cl B.2
Total number of patients with at least one adverse event 73 (54.9%) 88 (62.4%) 73 (57.9%) 234 (58.5%)
Total number of events 132 156 137 425
dcd B.2.1.2.1 44 (33.1%) 56 (39.7%) 50 (39.7%) 150 (37.5%)
dcd B.2.2.3.1 48 (36.1%) 59 (41.8%) 44 (34.9%) 151 (37.8%)
cl C.1
Total number of patients with at least one adverse event 50 (37.6%) 53 (37.6%) 42 (33.3%) 145 (36.2%)
Total number of events 62 75 62 199
dcd C.1.1.1.3 50 (37.6%) 53 (37.6%) 42 (33.3%) 145 (36.2%)
cl C.2
Total number of patients with at least one adverse event 50 (37.6%) 65 (46.1%) 50 (39.7%) 165 (41.2%)
Total number of events 67 87 63 217
dcd C.2.1.2.1 50 (37.6%) 65 (46.1%) 50 (39.7%) 165 (41.2%)
cl D.1
Total number of patients with at least one adverse event 74 (55.6%) 95 (67.4%) 72 (57.1%) 241 (60.2%)
Total number of events 120 158 112 390
dcd D.1.1.1.1 37 (27.8%) 59 (41.8%) 35 (27.8%) 131 (32.8%)
dcd D.1.1.4.2 54 (40.6%) 63 (44.7%) 48 (38.1%) 165 (41.2%)
cl D.2
Total number of patients with at least one adverse event 43 (32.3%) 54 (38.3%) 56 (44.4%) 153 (38.2%)
Total number of events 59 72 73 204
dcd D.2.1.5.3 43 (32.3%) 54 (38.3%) 56 (44.4%) 153 (38.2%)
vads %>% apply_filter("SER_SE") %>% t_ae()
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Filter 'SER' matched target ADAE.
581/1967 records matched the filter condition `AESER == 'Y'`.
Body System or Organ Class A: Drug X B: Placebo C: Combination All Patients
Dictionary-Derived Term (N=133) (N=141) (N=126) (N=400)
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Total number of patients with at least one adverse event 93 (69.9%) 110 (78.0%) 98 (77.8%) 301 (75.2%)
Overall total number of events 248 280 246 774
cl A.1
Total number of patients with at least one adverse event 42 (31.6%) 47 (33.3%) 44 (34.9%) 133 (33.2%)
Total number of events 54 63 58 175
dcd A.1.1.1.2 42 (31.6%) 47 (33.3%) 44 (34.9%) 133 (33.2%)
cl B.1
Total number of patients with at least one adverse event 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
Total number of events 73 63 75 211
dcd B.1.1.1.1 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
cl B.2
Total number of patients with at least one adverse event 48 (36.1%) 59 (41.8%) 44 (34.9%) 151 (37.8%)
Total number of events 74 78 65 217
dcd B.2.2.3.1 48 (36.1%) 59 (41.8%) 44 (34.9%) 151 (37.8%)
cl D.1
Total number of patients with at least one adverse event 37 (27.8%) 59 (41.8%) 35 (27.8%) 131 (32.8%)
Total number of events 47 76 48 171
dcd D.1.1.1.1 37 (27.8%) 59 (41.8%) 35 (27.8%) 131 (32.8%)
The filters you created using add_filter()
only persist for the
duration of your R
session. That means that whenever you restart your
R
session you will have to re-create them. The simplest way to do so
is by putting all your filter definitions inside a file filters.yml
file as described above and call load_filters("path/to/filters.yml")
before creating outputs.
If you pass an existing filter that does not match your target dataset
no warning or error is thrown. Instead apply_filter()
only tells you
which filters it actually used. Thus, checking that only valid filters
are passed to apply_filter()
is up to you.
add_filter(
id = "INFCT",
title = "Infections and Infestations",
target = "ADAE",
condition = AEBODSYS == "INFECTIONS AND INFESTATIONS"
)
adsl_filtered <- apply_filter(adsl, "DIABP_IT")
Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.
Internally, {filters}
stores the filter definitions inside the
.filters
environment defined in R/zzz.R
. When you add a filter with
add_filter()
a new variable with the name of the ID is created inside
this environment. This variable is a list that stores the title, target
and condition as a quoted expression. When you use apply_filter()
the
function looks for variables in .filters
matching the provided
suffixes. It then maps the filters to their target datasets and finally
builds a call to subset()
with the dataset as first and condition for
the filters as second argument. This call is then evaluated using
eval()
and the result is returned.