-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pre-processing from wwinference
not wweval
#172
Use pre-processing from wwinference
not wweval
#172
Conversation
less-cfaforecastrenewalww2
wwinference
not wweval
Ok this is still a WIP, I got up until the point where we are fitting the model with wwinference and almost everything is passing up until the pmfs validation (see here for issue) CDCgov/ww-inference-model#191 |
…seeming not to be in one we saved in wwinference...
…stream postprocess in wweval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel I know enough to approve this, but here is my review!
wweval/R/eval_fit.R
Outdated
) | ||
) |> | ||
dplyr::mutate( | ||
forecast_date = forecast_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is not a typo and you are just adding a new column with the same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should happen inside get_input_hosp_data
and should set the forecast_date
column equal to the value of forecast_date_i
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can make that change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy for this to be in a separate PR, but it should happen.
wweval/R/get_input_data.R
Outdated
) |> | ||
mutate( | ||
location = toupper(wwtp_jurisdiction), | ||
site = wwtp_name, | ||
lab = lab_id | ||
lab = lab_id, | ||
log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of random question, why not using something smaller than that? Why not 1e-20? or why not 1e-10? I don't see the place where this was done previously. There's also .Machine$double.min.exp
(and others) for you to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, or I think ideally we would want to pass this as a function arg and set it as a default value. Happy to set that value lower.
I think this wasn't done previously bc the other model took in natural scale concentration values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create an issue to discuss how to handle 0s in preprocessing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have a censoring model it shouldn't ever matter. I think my instinct would be something like
log_genome_copies_per_ml = ifelse(pcr_target_avg_conf > lod_sewage, log(pcr_target_avg_conc), log(log_sewage / 2)
or similar
wweval/R/get_table_sufficient_ww.R
Outdated
@@ -80,9 +80,9 @@ get_ww_data_flags <- function(input_ww_data, | |||
dplyr::summarize( | |||
last_date = max(date), | |||
n_dps = dplyr::n(), | |||
prop_below_lod = sum(below_LOD == 1) / dplyr::n(), | |||
prop_below_lod = sum(below_lod == 1) / dplyr::n(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prop_below_lod = sum(below_lod == 1) / dplyr::n(), | |
prop_below_lod = mean(below_lod == 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -36,46 +36,18 @@ eval_fit_ww <- function(config_index, | |||
) |> paste0(".rds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but might be more readable to construct this with glue::glue
wweval/R/eval_fit.R
Outdated
dplyr::mutate( | ||
"location" = !!location, | ||
forecast_date = forecast_date | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above. Incorporate into the get_input_ww_data
function in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this in this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaitejohnson will let you resolve the remaining convos and then merge
wweval/R/get_input_data.R
Outdated
) |> | ||
mutate( | ||
location = toupper(wwtp_jurisdiction), | ||
site = wwtp_name, | ||
lab = lab_id | ||
lab = lab_id, | ||
log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create an issue to discuss how to handle 0s in preprocessing?
wweval/R/get_input_data.R
Outdated
) |> | ||
mutate( | ||
location = toupper(wwtp_jurisdiction), | ||
site = wwtp_name, | ||
lab = lab_id | ||
lab = lab_id, | ||
log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have a censoring model it shouldn't ever matter. I think my instinct would be something like
log_genome_copies_per_ml = ifelse(pcr_target_avg_conf > lod_sewage, log(pcr_target_avg_conc), log(log_sewage / 2)
or similar
This isn't functional yet because I (stupidly) created functions with the same name in
wweval
andwwinference
that do slightly different things and expect slightly different arguments.In order to get this to work as is, I would have to tweak a lot of the
wweval
functions that will eventually become unnecessary,I am seeing where we are now and think it is time to rewrite the pre-processing of
wweval
to follow very closely the pre-processing inwwinference
(so get a NWSS dataset to look like theww_data
), and then use all of those functions and delete the ones with duplicate names inwweval
. This is a bit of a bear of a task and I probably am most suited to doing it since its redundancy that I created...Tasks:
eval_fit_ww()
andeval_fit_hosp()
to use thewwinference
package (currently based on Restructure hierarchical estimation based on reference subpopulation ww-inference-model#158) because we need the swappablehosp only
functionality... need to update the parameters here too but for now am usingwwinference
package parameterseval_fit_postprocess()
so that the outputs are compatible with the infrastructure inwweval
postprocessing.Additional tasks that we should do but don't prevent @gvegayon from merging
less-cfaforecastrenewalww-2
are:wweval
withinlesscfaforecastrenewalww
branch #175wwinference()
package to get the draws object instead of thewweval::get_model_draws_w_data()
Usewwinference::draws()
to get the draws joined with data instead ofwweval::get_draws_w_data()
inlesscfaforecastrenewalww
#176Have made separate issues for these, but in the spirit of making these PRs more manageable going to suggest we do those in separate PRs.
@dylanhmorris tagged you for awareness but I think @gvegayon can review this. Basically I just tested and made sure that we can generate the same outputs up until the end of
eval_fit_postprocess()
now using thewwinference
package. Still a bit clunky but I think we can clean up as we continue to work on this.Checklist of next steps (as suggested by @gvegayon):
wwinference::get_draws()
instead ofwweval::get_draws_w_data()
Usewwinference::draws()
to get the draws joined with data instead ofwweval::get_draws_w_data()
inlesscfaforecastrenewalww
#176wweval
withinlesscfaforecastrenewalww
branch #175eval_fit_ww
andeval_fit_hosp
to justeval_fit
with an arg indicating whether to fit model with or without wastewater