Use pre-processing from `wwinference` not `wweval` #172

kaitejohnson · 2024-09-25T09:17:18Z

This isn't functional yet because I (stupidly) created functions with the same name in wweval and wwinference that do slightly different things and expect slightly different arguments.

In order to get this to work as is, I would have to tweak a lot of the wweval functions that will eventually become unnecessary,

I am seeing where we are now and think it is time to rewrite the pre-processing of wweval to follow very closely the pre-processing in wwinference (so get a NWSS dataset to look like the ww_data), and then use all of those functions and delete the ones with duplicate names in wweval. This is a bit of a bear of a task and I probably am most suited to doing it since its redundancy that I created...

Tasks:

rewrite eval_fit_ww() and eval_fit_hosp() to use the wwinference package (currently based on Restructure hierarchical estimation based on reference subpopulation ww-inference-model#158) because we need the swappable hosp only functionality... need to update the parameters here too but for now am using wwinference package parameters
rewite eval_fit_postprocess() so that the outputs are compatible with the infrastructure in wweval postprocessing.

Additional tasks that we should do but don't prevent @gvegayon from merging less-cfaforecastrenewalww-2 are:

deleting the many unnecessary wweval pre-processing functions Delete unnecessary pre-processing functions in wweval within lesscfaforecastrenewalww branch #175
using the wwinference() package to get the draws object instead of the wweval::get_model_draws_w_data() Use wwinference::draws() to get the draws joined with data instead of wweval::get_draws_w_data() in lesscfaforecastrenewalww #176

Have made separate issues for these, but in the spirit of making these PRs more manageable going to suggest we do those in separate PRs.

@dylanhmorris tagged you for awareness but I think @gvegayon can review this. Basically I just tested and made sure that we can generate the same outputs up until the end of eval_fit_postprocess() now using the wwinference package. Still a bit clunky but I think we can clean up as we continue to work on this.

Checklist of next steps (as suggested by @gvegayon):

use wwinference::get_draws() instead of wweval::get_draws_w_data() Use wwinference::draws() to get the draws joined with data instead of wweval::get_draws_w_data() in lesscfaforecastrenewalww #176
Delete unnecessary pre-processing functions Delete unnecessary pre-processing functions in wweval within lesscfaforecastrenewalww branch #175
consider rewriting eval_fit_ww and eval_fit_hosp to just eval_fit with an arg indicating whether to fit model with or without wastewater

…ns when necessary

…sed to wwinference

kaitejohnson · 2024-09-27T21:30:26Z

Ok this is still a WIP, I got up until the point where we are fitting the model with wwinference and almost everything is passing up until the pmfs validation (see here for issue) CDCgov/ww-inference-model#191

…seeming not to be in one we saved in wwinference...

wweval/R/eval_post_process.R

wweval/R/get_input_data.R

…stream postprocess in wweval

gvegayon

I don't feel I know enough to approve this, but here is my review!

wweval/R/eval_fit.R

gvegayon · 2024-10-03T00:16:41Z

wweval/R/eval_fit.R

-  )
+  ) |>
+    dplyr::mutate(
+      forecast_date = forecast_date


I'm guessing this is not a typo and you are just adding a new column with the same name?

Seems like this should happen inside get_input_hosp_data and should set the forecast_date column equal to the value of forecast_date_i, no?

Sure I can make that change!

happy for this to be in a separate PR, but it should happen.

wweval/R/eval_fit.R

wweval/R/eval_post_process.R

gvegayon · 2024-10-03T00:25:04Z

wweval/R/get_input_data.R

    ) |>
    mutate(
      location = toupper(wwtp_jurisdiction),
      site = wwtp_name,
-      lab = lab_id
+      lab = lab_id,
+      log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8),


Sort of random question, why not using something smaller than that? Why not 1e-20? or why not 1e-10? I don't see the place where this was done previously. There's also .Machine$double.min.exp (and others) for you to consider.

Yeah, or I think ideally we would want to pass this as a function arg and set it as a default value. Happy to set that value lower.

I think this wasn't done previously bc the other model took in natural scale concentration values.

Create an issue to discuss how to handle 0s in preprocessing?

Given that we have a censoring model it shouldn't ever matter. I think my instinct would be something like

log_genome_copies_per_ml = ifelse(pcr_target_avg_conf > lod_sewage, log(pcr_target_avg_conc), log(log_sewage / 2)

or similar

damonbayer · 2024-10-03T00:57:26Z

wweval/R/get_table_sufficient_ww.R

@@ -80,9 +80,9 @@ get_ww_data_flags <- function(input_ww_data,
    dplyr::summarize(
      last_date = max(date),
      n_dps = dplyr::n(),
-      prop_below_lod = sum(below_LOD == 1) / dplyr::n(),
+      prop_below_lod = sum(below_lod == 1) / dplyr::n(),


Suggested change

prop_below_lod = sum(below_lod == 1) / dplyr::n(),

prop_below_lod = mean(below_lod == 1),

dylanhmorris

LGTM!

dylanhmorris · 2024-10-10T16:46:24Z

wweval/R/eval_fit.R

@@ -36,46 +36,18 @@ eval_fit_ww <- function(config_index,
  ) |> paste0(".rds")


Not for this PR but might be more readable to construct this with glue::glue

dylanhmorris · 2024-10-10T16:57:03Z

wweval/R/eval_fit.R

+    dplyr::mutate(
+      "location" = !!location,
+      forecast_date = forecast_date
+    )


Similar to above. Incorporate into the get_input_ww_data function in a separate PR.

I did this in this PR!

dylanhmorris

@kaitejohnson will let you resolve the remaining convos and then merge

dylanhmorris · 2024-10-10T16:57:50Z

wweval/R/get_input_data.R

    ) |>
    mutate(
      location = toupper(wwtp_jurisdiction),
      site = wwtp_name,
-      lab = lab_id
+      lab = lab_id,
+      log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8),


Create an issue to discuss how to handle 0s in preprocessing?

dylanhmorris · 2024-10-10T16:59:02Z

wweval/R/get_input_data.R

    ) |>
    mutate(
      location = toupper(wwtp_jurisdiction),
      site = wwtp_name,
-      lab = lab_id
+      lab = lab_id,
+      log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8),


Given that we have a censoring model it shouldn't ever matter. I think my instinct would be something like

log_genome_copies_per_ml = ifelse(pcr_target_avg_conf > lod_sewage, log(pcr_target_avg_conc), log(log_sewage / 2)

or similar

kaitejohnson added 2 commits September 25, 2024 09:08

add additional dplyr deps

0f07e6d

add some fixes to make compatible with wwinference and wweval functio…

276050e

…ns when necessary

kaitejohnson changed the title ~~Tweaks to less-cfaforecastrenewalww2~~ Use pre-processing from wwinference not wweval Sep 25, 2024

kaitejohnson added 2 commits September 27, 2024 20:18

rewrite get input hosp and ww data so that it formats ready to be pas…

80c51b3

…sed to wwinference

set up fit up ww fit, pending fix of wwinference pmf test tolerance

663f3d0

kaitejohnson added 2 commits September 29, 2024 19:22

fix fit hosp to also use package with include_ww =FALSE

cbdd0b5

very stuck on how to access objects previously in cmdstan object but …

0388f64

…seeming not to be in one we saved in wwinference...

kaitejohnson commented Sep 29, 2024

View reviewed changes

wweval/R/eval_post_process.R Show resolved Hide resolved

kaitejohnson commented Sep 29, 2024

View reviewed changes

wweval/R/eval_post_process.R Outdated Show resolved Hide resolved

kaitejohnson added the help wanted Extra attention is needed label Sep 29, 2024

add deduplication into preprocessing

5e5a5c0

kaitejohnson commented Oct 1, 2024

View reviewed changes

wweval/R/get_input_data.R Show resolved Hide resolved

damonbayer reviewed Oct 1, 2024

View reviewed changes

wweval/R/get_input_data.R Outdated Show resolved Hide resolved

kaitejohnson added 6 commits October 2, 2024 12:23

use summarize across

a3f4583

truncate eval hosp data

a86b715

modify to match sample model in wweval

b31253a

add location and forecast date cols where needed in pipeline

2acbf32

modify postprocessing to expect columns from package

758fd06

modify colnames and scales so that things look like they did for down…

b67a158

…stream postprocess in wweval

kaitejohnson removed the help wanted Extra attention is needed label Oct 2, 2024

kaitejohnson marked this pull request as ready for review October 2, 2024 20:29

kaitejohnson requested review from gvegayon and dylanhmorris October 2, 2024 20:29

gvegayon reviewed Oct 3, 2024

View reviewed changes

damonbayer reviewed Oct 3, 2024

View reviewed changes

kaitejohnson added 4 commits October 10, 2024 07:41

include forecast date and location in get_input_data functions

5239a89

add new parameters from wwinference package as inputs, adjust code

607b068

add log offset as default function arg

e03471c

add hosp post processing changes needed using wwinference

9c24ef4

dylanhmorris approved these changes Oct 10, 2024

View reviewed changes

dylanhmorris reviewed Oct 10, 2024

View reviewed changes

kaitejohnson mentioned this pull request Oct 10, 2024

Discuss how to handle 0s in concentrations in pre-processing #178

Open

kaitejohnson merged commit aa9d4fb into less-cfaforecastrenewalww-2 Oct 10, 2024
5 checks passed

kaitejohnson deleted the kj-tweaks-less-cfafrww2 branch October 10, 2024 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pre-processing from `wwinference` not `wweval` #172

Use pre-processing from `wwinference` not `wweval` #172

kaitejohnson commented Sep 25, 2024 •

edited

Loading

kaitejohnson commented Sep 27, 2024

gvegayon left a comment

gvegayon Oct 3, 2024

dylanhmorris Oct 3, 2024

kaitejohnson Oct 10, 2024

dylanhmorris Oct 10, 2024

gvegayon Oct 3, 2024

kaitejohnson Oct 10, 2024

dylanhmorris Oct 10, 2024

dylanhmorris Oct 10, 2024

damonbayer Oct 3, 2024

dylanhmorris left a comment

dylanhmorris Oct 10, 2024

dylanhmorris Oct 10, 2024

kaitejohnson Oct 10, 2024

dylanhmorris left a comment

dylanhmorris Oct 10, 2024

dylanhmorris Oct 10, 2024

	prop_below_lod = sum(below_lod == 1) / dplyr::n(),
	prop_below_lod = mean(below_lod == 1),

		@@ -36,46 +36,18 @@ eval_fit_ww <- function(config_index,
		) \|> paste0(".rds")

Use pre-processing from wwinference not wweval #172

Use pre-processing from wwinference not wweval #172

Conversation

kaitejohnson commented Sep 25, 2024 • edited Loading

kaitejohnson commented Sep 27, 2024

gvegayon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dylanhmorris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dylanhmorris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Use pre-processing from `wwinference` not `wweval` #172

Use pre-processing from `wwinference` not `wweval` #172

kaitejohnson commented Sep 25, 2024 •

edited

Loading