Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: a_summary.logical formatting not dealing as a_summary.factor when denom = 0 #649

Closed
3 tasks done
imazubi opened this issue Jul 21, 2022 · 7 comments · Fixed by #1079
Closed
3 tasks done
Assignees

Comments

@imazubi
Copy link
Contributor

imazubi commented Jul 21, 2022

Feature description

Hi @shajoezhu ,

When using the summarize_vars function, I realized that when the denominator is 0, summarize_vars() returns NA for logical variables while it returns 0 for factor variables for .stats = count_fraction. I feel both should act in the same way. I am proposing the solution.

See example:

df <- data.frame(var1 = c(TRUE, TRUE, FALSE, FALSE),
                 var2 = c("d", "d", "e", "e"),
                 var3 = c("f", "f", "f", "f"))

df <- df %>% dplyr::mutate_if(is.character,as.factor) %>%
  mutate(var3 = factor(var3, levels = c("f", "g")))

my_table <- basic_table() %>%
  split_cols_by("var3") %>%
  summarize_vars(
    vars = c("var1", "var2")
    ) %>%
  build_table(df = df)

Look at the red and green circles.

image

The following is the reason why this is happening:

  • In s_summary.logical
    y$count_fraction <- c(count, ifelse(dn > 0, count / dn, NA))
  • In s_summary.factor
y$count_fraction <- lapply(
   y$count,
   function(x) {
     c(x, ifelse(dn > 0, x / dn, 0))
   }
 )

In the logical part we are assigning the NA value if denom is 0, In the factor case we are assigning the 0 value for the same case. Then, if we go to format_count_fraction() which is the function used for formatting, if x[1] or x[2] is NA (which is the case for the logical scenario above), then the output will be NA.

So the solution would be:

  • To act in the same in both cases. I would change s_summary.logical to be aligned with s_summary.factor:
    y$count_fraction <- c(count, ifelse(dn > 0, count / dn, NA))

  • Don't you think that when showing counts and percentages (xx (xx%)), would be even better the output to be i.e 0/0%, instead of just 0 when denom is 0?

See the new proposal on format_count_fraction() code:

...
result <- if (x[1] == 0 && x[2] != 0) {
  "0"
} else if (x[2] == 0) {
  "0/0%"
} else {
    paste0(x[1], " (", round(x[2] * 100, 1), "%)")
  }...

Code of Conduct

  • I agree to follow this project's Code of Conduct.

Contribution Guidelines

  • I agree to follow this project's Contribution Guidelines.

Security Policy

  • I agree to follow this project's Security Policy.
@Melkiades
Copy link
Contributor

I think this is a fast fix. Should we use just 0 or NA? @edelarua @ayogasekaram @shajoezhu

@shajoezhu
Copy link
Contributor

NA

@Melkiades
Copy link
Contributor

NA

This changes a lot of things. Keeping it in todo. If we want to unify this behavior, we need to be sure that NA is an universal standard for / 0. Can we assume so? @anajens @barnett11 ?

@barnett11
Copy link

The actual standards guidance (which was written pre-R onboarding I should caveat) is below,

3.1. Presentation of counts and percentages
3.1.1. When a treatment group has 0 subjects, report “0” (no %) for sample size and all values of categorical variables
3.1.2. When the level of a categorical variable has zero subjects but the denominator is non-zero, report only “0” (no %) for that category.

3.2. Presentation of Summary Statistics
3.2.1. When a treatment group has 0 subjects, report “0” for sample size and report “NE” (not estimable) for all summary statistics
3.2.2. When a treatment group has insufficient number of subjects to compute a summary statistic (e.g. SD based on a single subject), report that statistic as “NE”

@shajoezhu
Copy link
Contributor

@Melkiades sorry for the very brief reply, was meant to get back to this thread, and I was hoping for dig up the rationale we did the way we have now. but got side tracked, and many thanks to @barnett11 , this is exactly the context i was looking for. When I was thinking of NA, I was thinking to specify use the NA_str to modify the "reporting" here.

pernally, i hope we can seperate the actual value we compute here and the values that we reporting here.
it is confusing, and we wont be able to distinguish the case of 0/0 vs 0/10,

back in the days when we first wrote these summary statistics function, rtables did not have the option for na_str display. I think that is why we mixed these computation together.

I think another good examples can be found at, #543 (comment)
and https://github.com/insightsengineering/tlg-catalog/pull/104/files

I was thinking, maybe we can do something more explicit to the reporting option, the NA is caused by "dividing a treatment group of 0", the computation should be mathematical, letting 0/0 be computed as 0 is less ideal I think. but if we are more explicit with what "NA" is displayed as, to users, this option would be more flexible? what do you think? @Melkiades @barnett11 @anajens @khatril

@Melkiades
Copy link
Contributor

Melkiades commented Sep 12, 2023

If I understood correctly @barnett11 comment, it seems that any summary statistic derived from a 0 population should be NE, while counts should just be 0 (rightfully so). I would put NA for tern to stay general, while having NE in tlg-catalog through na_str display. Does this make sense @barnett11 @anajens?

@shajoezhu
Copy link
Contributor

checked with @barnett11 ,

specify na_str = "NE" for this could work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants