Improve numerical precision of several p-value calculations #1210

remlapmot · 2024-07-22T10:34:53Z

Hi there,

This is relatively minor but I was looking at the source code for an unrelated reason and I noticed that in several places you have used the sub-optimal approach to calculating p-values.

You have coded 1 - ... instead of using the lower.tail = FALSE argument of the distribution functions. The 1 - ... calculation loses numerical precision.

Below are some examples which show this - admittedly you only see this with very large values of a z/t/X^2/F statistic, but it is probably worth using the more precise calculation. For example, for a z-test the 1 - ... calculation reports 0 for approx. abs(Z) > 8.3 , whereas the lower.tail = FALSE calculation will give you a p-value up to abs(Z) approx 37.5 (which is approx abs(qnorm(.Machine$double.xmin / 2))).

statistic <- -7.5
2 * (1 - stats::pnorm(abs(statistic)))
#> [1] 6.37268e-14
2 * (stats::pnorm(abs(statistic), lower.tail = FALSE))
#> [1] 6.381783e-14

statistic <- -8.3
2 * (1 - stats::pnorm(abs(statistic)))
#> [1] 0
2 * (stats::pnorm(abs(statistic), lower.tail = FALSE))
#> [1] 1.041114e-16

statistic <- -37.5 # approx qnorm(.Machine$double.xmin / 2)
2 * (1 - stats::pnorm(abs(statistic)))
#> [1] 0
2 * (stats::pnorm(abs(statistic), lower.tail = FALSE))
#> [1] 9.210706e-308

^{Created on 2024-07-22 with reprex v2.1.1}

Also,

In the lfe-tidiers.R commit I added what I think is a missing abs() around the statistic
I added a missing stats:: qualifier in a few places.

Tom

remlapmot added 2 commits July 18, 2024 17:20

Add stats qualification

62b3f09

Amend p-values calculations to use lower.tail=FALSE

20677bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve numerical precision of several p-value calculations #1210

Improve numerical precision of several p-value calculations #1210

remlapmot commented Jul 22, 2024

Improve numerical precision of several p-value calculations #1210

Are you sure you want to change the base?

Improve numerical precision of several p-value calculations #1210

Conversation

remlapmot commented Jul 22, 2024