Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

759 document formats@main #795

Merged
merged 9 commits into from
Jan 16, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 278 additions & 0 deletions vignettes/tern_formats.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
---
title: "Formatting Functions"
date: "2023-01-12"
output:
rmarkdown::html_document:
theme: "spacelab"
highlight: "kate"
toc: true
toc_float: true
vignette: >
%\VignetteIndexEntry{Formatting Functions}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
editor_options:
markdown:
wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

## `tern` Formatting Functions Overview

The `tern` R package provides functions to create common analyses from clinical trials in `R` and these functions
have default formatting arguments for displaying the values in the output a specific way.

`tern` formatting differs compared to the formatting available in the `formatters` package as `tern`
formats are capable of handling logical statements, allowing for more fine-tuning of the output displayed.
Depending on what type of value is being displayed, and what that value is, the format of the output will change.
Whereas when using the `formatters` package, the specified format is applied regardless of the value.

To see the available formatting functions available in `tern` see `?formatting_functions`.
To see the available format strings available in `formatters` see `formatters::list_valid_format_labels()`.

## Comparing `tern` & `formatters` Formats

The packages used in this vignette are:

```{r, message=FALSE}
library(rtables)
library(formatters)
library(tern)
library(dplyr)
```

The example below demonstrates the use of `tern` formatting in the `count_abnormal()` function. The example
"low" category has a non-zero numerator value so both a fraction and a percentage value are displayed, while
the "high" value has a numerator value of zero and so the fraction value is displayed without also displaying
the redundant zero percentage value.

```{r}
df2 <- data.frame(
ID = as.character(c(1, 1, 2, 2)),
RANGE = factor(c("NORMAL", "LOW", "HIGH", "LOW")),
BL_RANGE = factor(c("NORMAL", "NORMAL", "HIGH", "HIGH")),
ONTRTFL = c("", "Y", "", "Y"),
stringsAsFactors = FALSE
)

df2 <- df2 %>%
filter(ONTRTFL == "Y")

basic_table() %>%
count_abnormal(
var = "RANGE",
abnormal = list(low = "LOW", high = "HIGH"),
variables = list(id = "ID", baseline = "BL_RANGE"),
exclude_base_abn = FALSE,
.formats = list(fraction = format_fraction)
) %>%
build_table(df2)
```

In the following example the `count_abnormal()` function is utilized again. This time both "low" values and "high" values
have a non-zero numerator and so both show a percentage.

```{r}
df2 <- data.frame(
ID = as.character(c(1, 1, 2, 2)),
RANGE = factor(c("NORMAL", "LOW", "HIGH", "HIGH")),
BL_RANGE = factor(c("NORMAL", "NORMAL", "HIGH", "HIGH")),
ONTRTFL = c("", "Y", "", "Y"),
stringsAsFactors = FALSE
)

df2 <- df2 %>%
filter(ONTRTFL == "Y")

basic_table() %>%
count_abnormal(
var = "RANGE",
abnormal = list(low = "LOW", high = "HIGH"),
variables = list(id = "ID", baseline = "BL_RANGE"),
exclude_base_abn = FALSE,
.formats = list(fraction = format_fraction)
) %>%
build_table(df2)
```

The following example demonstrates the difference when `formatters` is used instead to format the output. Here we choose to use
`"xx / xx"` as our value format. The "high" value has a zero numerator value and the "low" value has a non-zero numerator, yet both
are displayed in the same format.

```{r}
df2 <- data.frame(
ID = as.character(c(1, 1, 2, 2)),
RANGE = factor(c("NORMAL", "LOW", "HIGH", "LOW")),
BL_RANGE = factor(c("NORMAL", "NORMAL", "HIGH", "HIGH")),
ONTRTFL = c("", "Y", "", "Y"),
stringsAsFactors = FALSE
)
df2 <- df2 %>%
filter(ONTRTFL == "Y")

basic_table() %>%
count_abnormal(
var = "RANGE",
abnormal = list(low = "LOW", high = "HIGH"),
variables = list(id = "ID", baseline = "BL_RANGE"),
exclude_base_abn = FALSE,
.formats = list(fraction = "xx / xx")
) %>%
build_table(df2)
```

The same concept occurs when using any of the available formats from the `formatters` package. The following example displays the same result using the `"xx.x / xx.x"` format instead. Use `formatters::list_valid_format_labels()` to see the full list of available formats in `formatters`.

```{r}
df2 <- data.frame(
ID = as.character(c(1, 1, 2, 2)),
RANGE = factor(c("NORMAL", "LOW", "HIGH", "LOW")),
BL_RANGE = factor(c("NORMAL", "NORMAL", "HIGH", "HIGH")),
ONTRTFL = c("", "Y", "", "Y"),
stringsAsFactors = FALSE
)
df2 <- df2 %>%
filter(ONTRTFL == "Y")

basic_table() %>%
count_abnormal(
var = "RANGE",
abnormal = list(low = "LOW", high = "HIGH"),
variables = list(id = "ID", baseline = "BL_RANGE"),
exclude_base_abn = FALSE,
.formats = list(fraction = "xx.x / xx.x")
) %>%
build_table(df2)
```

## Formatting Function Basics

Current `tern` formatting functions consider some of the following aspects when setting custom behaviors:

* Missing values - a custom value or string can be set to display for missing values instead of `NA`.
* 0's - if a cell value is zero, `tern` fraction formatting functions will exclude the accompanying percentage value.
* Number of decimal places to display - the number of decimal places can be fixed if needed.
* Value thresholds - a different format or value can be displayed depending on whether the value is within a certain threshold.

#### Number of Decimal Places to Display

Two functions that set a fixed number of decimal places (specifically 1) are `format_fraction_fixed_dp()` and `format_count_fraction_fixed_dp()`. By default, formatting functions will remove trailing zeros, but these two functions will always have one decimal place in their percentage, even if the digit is a zero. See the following example:

```{r}
format_fraction_fixed_dp(x = c(num = 1L, denom = 3L))
format_fraction_fixed_dp(x = c(num = 1L, denom = 2L))

format_count_fraction_fixed_dp(x = c(2, 0.6667))
format_count_fraction_fixed_dp(x = c(2, 0.25))
```

#### Value Thresholds

Functions that set custom values according to a certain threshold include `format_extreme_values()`, `format_extreme_values_ci()`, and `format_fraction_threshold()`. The extreme value formats work similarly to allow the user to specify the maximum number of digits to include, and very large or very small values are given a special string value. For example:

```{r}
extreme_format <- format_extreme_values(digits = 2)
extreme_format(0.235)
extreme_format(0.001)
extreme_format(Inf)
```

The `format_fraction_threshold()` function allows the user to specify a lower percentage threshold, below which values are instead assigned a special string value. For example:

```{r}
fraction_format <- format_fraction_threshold(0.05)
fraction_format(x = c(20, 0.1))
fraction_format(x = c(2, 0.01))
```

See the documentation on each function for specific details on their behavior and how to customize them.

## Creating Custom Formatting Functions

If your table requires customized output that cannot be displayed using one of the pre-existing `tern` formatting functions, you may want to consider creating a new formatting function. When creating your own formatting function it is important to consider the aspects listed in the Formatting Function Customization section above.

In this section we will create a custom formatting function derived from the `format_fraction_fixed_dp()` function. First we will take a look at this function in detail and then we will customize it.

```{r}
# First we will see how the format_fraction_fixed_dp code works and displays the outputs
format_fraction_fixed_dp <- function(x, ...) {
attr(x, "label") <- NULL
checkmate::assert_vector(x)
checkmate::assert_count(x["num"])
checkmate::assert_count(x["denom"])

result <- if (x["num"] == 0) {
paste0(x["num"], "/", x["denom"])
} else {
paste0(
x["num"], "/", x["denom"],
" (", sprintf("%.1f", round(x["num"] / x["denom"] * 100, 1)), "%)"
)
}
return(result)
}
```

Here we see that if the numerator value is greater than 0, the fraction and percentage is displayed. If the numerator is 0, only the fraction is shown. Percent values always display 1 decimal place. Below we will create a dummy dataset and then observe the output value behavior when this formatting function is applied.

```{r}
df2 <- data.frame(
ID = as.character(c(1, 1, 2, 2)),
RANGE = factor(c("NORMAL", "LOW", "HIGH", "LOW")),
BL_RANGE = factor(c("NORMAL", "NORMAL", "HIGH", "HIGH")),
ONTRTFL = c("", "Y", "", "Y"),
stringsAsFactors = FALSE
) %>%
filter(ONTRTFL == "Y")

basic_table() %>%
count_abnormal(
var = "RANGE",
abnormal = list(low = "LOW", high = "HIGH"),
variables = list(id = "ID", baseline = "BL_RANGE"),
exclude_base_abn = FALSE,
.formats = list(fraction = format_fraction_fixed_dp)
) %>%
build_table(df2)
```

Now we will modify this function to make our custom formatting function, `custom_format`. We want to display 3 decimal places in the percent value, and if the numerator value is 0 we only want to display a 0 value (without the denominator).

```{r}
custom_format <- function(x, ...) {
attr(x, "label") <- NULL
checkmate::assert_vector(x)
checkmate::assert_count(x["num"])
checkmate::assert_count(x["denom"])

result <- if (x["num"] == 0) {
paste0(x["num"]) # We remove the denominator on this line so that only a 0 is displayed
} else {
paste0(
x["num"], "/", x["denom"],
" (", sprintf("%.3f", round(x["num"] / x["denom"] * 100, 1)), "%)" # We include 3 decimal places with %.3f
)
}
return(result)
}

basic_table() %>%
count_abnormal(
var = "RANGE",
abnormal = list(low = "LOW", high = "HIGH"),
variables = list(id = "ID", baseline = "BL_RANGE"),
exclude_base_abn = FALSE,
.formats = list(fraction = custom_format) # Here we implement our new custom_format function
) %>%
build_table(df2)
```

## Summary

Each `tern` analysis function has pre-specified default format functions to implement when generating output, some of which are taken from the `formatters` package and some of which are custom formatting functions stored in `tern`. These `tern` functions differ compared to those from `formatters` in that logical statements can be used to set value-dependent customized formats. If you would like to create your own custom formatting function to use with `tern`, be sure to carefully consider which rules you want to implement to handle different input values.