Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A why it works maths vignette #37

Closed
seabbs opened this issue Sep 4, 2024 · 7 comments · Fixed by #68
Closed

A why it works maths vignette #37

seabbs opened this issue Sep 4, 2024 · 7 comments · Fixed by #68
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed

Comments

@seabbs
Copy link
Contributor

seabbs commented Sep 4, 2024

We have some mathematical justification in the getting started vignette but I think we could do with more in another maths only (i.e no code) vignette. What do you think @SamuelBrand1?

@seabbs seabbs added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed labels Sep 4, 2024
@SamuelBrand1
Copy link
Collaborator

Well I'm a fan of this idea.

@SamuelBrand1
Copy link
Collaborator

SamuelBrand1 commented Sep 6, 2024

Following on from f2f discussion with @seabbs I'm not sure https://primarycensoreddist.epinowcast.org/dev/articles/primarycensoreddist.html#compute-the-primary-event-censored-cumulative-distribution-function-cdf-for-delays-with-pprimarycensoreddist is quite correct.

We can characterise the delay distribution with primary event censoring as the sum of two random variables:

$$T= X + P.$$

$T$ is the random secondary event time relative to the left-point of the censoring window for the censored primary time, which is the typical way to measure (other measures can be easily rewritten in this form).

Where $X \sim f_\theta$ is a delay from some parametric positive distribution, which might be right-truncated with truncation parameter $D$, and $P \sim g_r$ is the primary time distribution within the primary censoring window, which might also be parametric (e.g. tilted by exp growth rate #6 ).

Assuming independence, the cumulative distribution function for $T$ is:

$$F_T(t) = \mathbb{E}_{P,r} \Big[ F_{X,\theta}(t - P)\Big] = \int_0^{pwindow} F_{X,\theta}(t - p) q_r(p) dp.$$

Using standard probability identity and arrives at first equation of the linked section.

For the right-truncated version $X_{\tau}$ of some untruncated delay random variable $X$, with truncation parameter $D$, then the CDF of $X_\tau$: $F_{X_\tau}$ relates to the cdf $F_X$

$$F_{X_\tau,\theta, D}(t) = {F_{X,\theta}(t \wedge D) \over F_{X,\theta}(D)}.$$

The cumulative dist. identity for $T = X + P$ is not affected by the truncation, or otherwise, of $X$ therefore inserting the truncation definition gives:

$$F_T(t) = \int_0^{pwindow} {F_{X,\theta}((t-p) \wedge D) \over F_{X,\theta}(D)} q_r(p) dp = { \int_0^{pwindow} F_{X,\theta}((t-p) \wedge D) q_r(p) dp \over F_{X,\theta}(D)}.$$

This differs from the second equation in the linked section, which I think might be incorrect.

If we wish to right-truncate $T$, rather than $X$, at $D$ then the correct cum. dist. function is:

$$F_{T_\tau}(t) = {F_T(t\wedge D) \over F_T(D)} = {\int_0^{pwindow} F_{X,\theta}((t\wedge D) - p) q_r(p) dp \over \int_0^{pwindow} F_{X,\theta}(D - p) q_r(p) dp}.$$

Which is again different to the second equation since its the ratio of two integrals rather than the integral of a ratio.

@seabbs
Copy link
Contributor Author

seabbs commented Sep 6, 2024

If we wish to right-truncate 𝑇, rather than 𝑋

This is what we want I think and what is implemented in the currently vectorised code. See the similarity to equation 21 in Park et al. 2024 (i.e double extra boring) I have also been looking at this in the numerical stability branch (changing this again makes no impact on numerical stability but does localise the issue the D). We should do some numerical testing by expanding these tests to include testing about random sampled emprical versions: https://github.com/epinowcast/primarycensoreddist/blob/2d8f339821aa3c975cfa667c8c738107d178153d/tests/testthat/test-stan-rpd-primarycensoreddist.R#L183

@seabbs
Copy link
Contributor Author

seabbs commented Sep 6, 2024

Also the difference between intregrals must be very small if the analytical test here is correct: https://github.com/epinowcast/primarycensoreddist/blob/74c2c8c83ff230edbaf5643d1da7351a9714ee7e/tests/testthat/test-rpd-primarycensoreddist.R#L100

@SamuelBrand1
Copy link
Collaborator

I think we're straying on this issue away from the maths vignette (which has a small error in the equation) towards the stability of sampling?

@seabbs seabbs mentioned this issue Sep 6, 2024
9 tasks
@seabbs
Copy link
Contributor Author

seabbs commented Sep 6, 2024

Yes agree. I do think this makes clear a nice verbose maths vignette would be helpful sooner rather than later

@seabbs
Copy link
Contributor Author

seabbs commented Sep 11, 2024

@SamuelBrand1 just a light bump that this would be a great addition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants