Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pre-processing from wwinference not wweval #172

Merged
merged 17 commits into from
Oct 10, 2024

Conversation

kaitejohnson
Copy link
Collaborator

@kaitejohnson kaitejohnson commented Sep 25, 2024

This isn't functional yet because I (stupidly) created functions with the same name in wweval and wwinference that do slightly different things and expect slightly different arguments.

In order to get this to work as is, I would have to tweak a lot of the wweval functions that will eventually become unnecessary,

I am seeing where we are now and think it is time to rewrite the pre-processing of wweval to follow very closely the pre-processing in wwinference (so get a NWSS dataset to look like the ww_data), and then use all of those functions and delete the ones with duplicate names in wweval. This is a bit of a bear of a task and I probably am most suited to doing it since its redundancy that I created...

Tasks:

Additional tasks that we should do but don't prevent @gvegayon from merging less-cfaforecastrenewalww-2 are:

Have made separate issues for these, but in the spirit of making these PRs more manageable going to suggest we do those in separate PRs.

@dylanhmorris tagged you for awareness but I think @gvegayon can review this. Basically I just tested and made sure that we can generate the same outputs up until the end of eval_fit_postprocess() now using the wwinference package. Still a bit clunky but I think we can clean up as we continue to work on this.

Checklist of next steps (as suggested by @gvegayon):

@kaitejohnson kaitejohnson changed the title Tweaks to less-cfaforecastrenewalww2 Use pre-processing from wwinference not wweval Sep 25, 2024
@kaitejohnson
Copy link
Collaborator Author

Ok this is still a WIP, I got up until the point where we are fitting the model with wwinference and almost everything is passing up until the pmfs validation (see here for issue) CDCgov/ww-inference-model#191

@kaitejohnson kaitejohnson added the help wanted Extra attention is needed label Sep 29, 2024
@kaitejohnson kaitejohnson removed the help wanted Extra attention is needed label Oct 2, 2024
@kaitejohnson kaitejohnson marked this pull request as ready for review October 2, 2024 20:29
Copy link
Member

@gvegayon gvegayon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel I know enough to approve this, but here is my review!

wweval/R/eval_fit.R Outdated Show resolved Hide resolved
)
) |>
dplyr::mutate(
forecast_date = forecast_date
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is not a typo and you are just adding a new column with the same name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should happen inside get_input_hosp_data and should set the forecast_date column equal to the value of forecast_date_i, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can make that change!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy for this to be in a separate PR, but it should happen.

wweval/R/eval_fit.R Outdated Show resolved Hide resolved
wweval/R/eval_fit.R Outdated Show resolved Hide resolved
wweval/R/eval_post_process.R Outdated Show resolved Hide resolved
wweval/R/eval_post_process.R Show resolved Hide resolved
) |>
mutate(
location = toupper(wwtp_jurisdiction),
site = wwtp_name,
lab = lab_id
lab = lab_id,
log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of random question, why not using something smaller than that? Why not 1e-20? or why not 1e-10? I don't see the place where this was done previously. There's also .Machine$double.min.exp (and others) for you to consider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, or I think ideally we would want to pass this as a function arg and set it as a default value. Happy to set that value lower.

I think this wasn't done previously bc the other model took in natural scale concentration values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue to discuss how to handle 0s in preprocessing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have a censoring model it shouldn't ever matter. I think my instinct would be something like

log_genome_copies_per_ml = ifelse(pcr_target_avg_conf > lod_sewage, log(pcr_target_avg_conc), log(log_sewage / 2)

or similar

@@ -80,9 +80,9 @@ get_ww_data_flags <- function(input_ww_data,
dplyr::summarize(
last_date = max(date),
n_dps = dplyr::n(),
prop_below_lod = sum(below_LOD == 1) / dplyr::n(),
prop_below_lod = sum(below_lod == 1) / dplyr::n(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prop_below_lod = sum(below_lod == 1) / dplyr::n(),
prop_below_lod = mean(below_lod == 1),

Copy link
Collaborator

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -36,46 +36,18 @@ eval_fit_ww <- function(config_index,
) |> paste0(".rds")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but might be more readable to construct this with glue::glue

dplyr::mutate(
"location" = !!location,
forecast_date = forecast_date
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above. Incorporate into the get_input_ww_data function in a separate PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in this PR!

Copy link
Collaborator

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaitejohnson will let you resolve the remaining convos and then merge

) |>
mutate(
location = toupper(wwtp_jurisdiction),
site = wwtp_name,
lab = lab_id
lab = lab_id,
log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue to discuss how to handle 0s in preprocessing?

) |>
mutate(
location = toupper(wwtp_jurisdiction),
site = wwtp_name,
lab = lab_id
lab = lab_id,
log_genome_copies_per_ml = log(pcr_target_avg_conc + 1e-8),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have a censoring model it shouldn't ever matter. I think my instinct would be something like

log_genome_copies_per_ml = ifelse(pcr_target_avg_conf > lod_sewage, log(pcr_target_avg_conc), log(log_sewage / 2)

or similar

@kaitejohnson kaitejohnson merged commit aa9d4fb into less-cfaforecastrenewalww-2 Oct 10, 2024
5 checks passed
@kaitejohnson kaitejohnson deleted the kj-tweaks-less-cfafrww2 branch October 10, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants