Distinguish NA (missing) from NA (accumulated) #547

sbfnk · 2024-02-14T14:56:33Z

Enabling this would, I think, require some sort of that marks dates explicitly as missing vs. NA.

I think this would be my preferred option as it would be more general but I also think it can be addressed in its own review as it would be a superset of this PR.

My thought on how that would work is to have a new variable (accumulate) that indicates which days should be summed.

Originally posted by @seabbs in #534 (review)

The text was updated successfully, but these errors were encountered:

sbfnk · 2024-02-14T15:13:20Z

My thought on how that would work is to have a new variable (accumulate) that indicates which days should be summed.

Another option would be to distinguish between explicit (date exists in the data, value is NA meaning missing) vs. implicit (date doesn't exist in the data meaning accumulate) NAs which might be easier preprocessing if potentially easier to inadvertently get wrong.

seabbs · 2024-02-19T22:59:18Z

yeah potentially but also think that could be a bit dangerous. I would instead suggest making a helper function that maps from that structure to the less dangerous explicit version for those that clearly want that.

seabbs · 2024-02-19T22:59:33Z

and if going that way I'd suggest that becomes a dependent issue

sbfnk · 2024-09-16T08:42:22Z

With appropriate warnings messages as suggested in #771 then I think this is the best option as it can take all information from a 2-column data frame as before:

distinguish between explicit (date exists in the data, value is NA meaning missing) vs. implicit (date doesn't exist in the data meaning accumulate) NAs

seabbs · 2024-09-16T09:49:40Z

I don't think I agree. I think there should be one way of handling missing data (as missing) and it can throw a warning if creating missing dates saying what it is doing.

I think overloading NAs like we have done for accumulation is confusing and dangerous and would much prefer a separate feature describing this.

Something I think we want to be aware of is non-standard schemes. These could be 1. Non-constant reporting and 2. repeated reporting (some counts are reported twice as aggregates of different dates).

I haven't really seen the latter and its quite an edge case so its unclear to me if we really want to support it or not.

sbfnk · 2024-09-16T14:47:36Z

I'm open to suggestions and acknowledge there are dangers in overloading interpretations. My ideal would be one in which it's fairly straightforward (and safe) to handle the common cases of daily/weekly data on incidence/prevalence and missingness that could correspond to zeroes or missed reports.

jamesmbaazam · 2024-09-19T15:40:04Z

With appropriate warnings messages as suggested in #771 then I think this is the best option as it can take all information from a 2-column data frame as before:

distinguish between explicit (date exists in the data, value is NA meaning missing) vs. implicit (date doesn't exist in the data meaning accumulate) NAs

This can now be checked with the test_data_complete() function introduced in #774, when merged.

seabbs · 2024-09-23T14:35:02Z

My ideal would be one in which it's fairly straightforward (and safe) to handle the common cases of daily/weekly data on incidence/prevalence and missingness that could correspond to zeroes or missed reports

Do you not think this is possible without overloading using the kind of approach I have suggested with helper utilities?

sbfnk · 2024-10-22T12:30:56Z

My ideal would be one in which it's fairly straightforward (and safe) to handle the common cases of daily/weekly data on incidence/prevalence and missingness that could correspond to zeroes or missed reports

Do you not think this is possible without overloading using the kind of approach I have suggested with helper utilities?

It probably is. It could relate to supporting option (1) as suggested in #346 (comment)
with an additional fill_missing(na = "accumulate") function or one with a better name, and then failing with an error if any data with NAs is passed to estimate_infections(). As mentioned there it would be quite a breaking change but also one which probably helps avoid the kind of confusion and potential for error that we have seen.

seabbs · 2024-10-23T11:15:56Z

I'm not sure I totally follow this now. Does the suggestion support a mixture of accumulation and missing data?

sbfnk · 2024-10-23T13:09:40Z

estimate_infections would expect an accumulation column if one wanted to accumulate but a user could create this based on some interpretation of NAs using a separate function.

seabbs · 2024-10-23T13:15:35Z

and NAs would go back to working only with missing data and you wouldn't support repeated reporting?

seabbs · 2024-10-23T13:15:44Z

if yes and no sounds good

sbfnk added this to EpiNow2 v2.0.0 Feb 20, 2024

github-project-automation bot moved this to Todo in EpiNow2 v2.0.0 Feb 20, 2024

sbfnk mentioned this issue Apr 30, 2024

reported_cases_opts() #346

Open

sbfnk removed this from EpiNow2 v2.0.0 May 1, 2024

sbfnk added this to the CRAN v1.6 release milestone May 1, 2024

sbfnk modified the milestones: CRAN v1.6 release, CRAN v1.7 release Sep 20, 2024

seabbs mentioned this issue Oct 8, 2024

Add vignette on modelling temporally aggregated data (i.e., obs_opts(na = "accumulate") ) #772

Open

This was referenced Oct 24, 2024

Proposed interface for accumulation / missingness #839

Merged

Add support for time-varying ascertainment #792

Open

sbfnk linked a pull request Nov 11, 2024 that will close this issue

updated interface for accumulation #851

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish NA (missing) from NA (accumulated) #547

Distinguish NA (missing) from NA (accumulated) #547

sbfnk commented Feb 14, 2024 •

edited

Loading

sbfnk commented Feb 14, 2024

seabbs commented Feb 19, 2024

seabbs commented Feb 19, 2024 •

edited

Loading

sbfnk commented Sep 16, 2024

seabbs commented Sep 16, 2024 •

edited

Loading

sbfnk commented Sep 16, 2024

jamesmbaazam commented Sep 19, 2024

seabbs commented Sep 23, 2024

sbfnk commented Oct 22, 2024 •

edited

Loading

seabbs commented Oct 23, 2024

sbfnk commented Oct 23, 2024

seabbs commented Oct 23, 2024

seabbs commented Oct 23, 2024

Distinguish NA (missing) from NA (accumulated) #547

Distinguish NA (missing) from NA (accumulated) #547

Comments

sbfnk commented Feb 14, 2024 • edited Loading

sbfnk commented Feb 14, 2024

seabbs commented Feb 19, 2024

seabbs commented Feb 19, 2024 • edited Loading

sbfnk commented Sep 16, 2024

seabbs commented Sep 16, 2024 • edited Loading

sbfnk commented Sep 16, 2024

jamesmbaazam commented Sep 19, 2024

seabbs commented Sep 23, 2024

sbfnk commented Oct 22, 2024 • edited Loading

seabbs commented Oct 23, 2024

sbfnk commented Oct 23, 2024

seabbs commented Oct 23, 2024

seabbs commented Oct 23, 2024

sbfnk commented Feb 14, 2024 •

edited

Loading

seabbs commented Feb 19, 2024 •

edited

Loading

seabbs commented Sep 16, 2024 •

edited

Loading

sbfnk commented Oct 22, 2024 •

edited

Loading