-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed mapping error rate from estimate of denoised copy ratios output by gCNV and updated sklearn. #7261
Conversation
…ut by gCNV and updated sklearn.
17cadfa
to
cb2ab69
Compare
@fleharty want to get this in before release? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, though I would prefer to have more tests in the python code.
@@ -786,8 +786,7 @@ def __init__(self, | |||
# the expected number of erroneously mapped reads | |||
mean_mapping_error_correction_s = eps_mapping * read_depth_s * shared_workspace.average_ploidy_s | |||
|
|||
denoised_copy_ratio_st = ((shared_workspace.n_st - mean_mapping_error_correction_s.dimshuffle(0, 'x')) | |||
/ ((1.0 - eps_mapping) * read_depth_s.dimshuffle(0, 'x') * bias_st)) | |||
denoised_copy_ratio_st = shared_workspace.n_st / (read_depth_s.dimshuffle(0, 'x') * bias_st) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this change should have broken a test, but it looks like the only tests test the Python environment. Should we add tests (maybe not for this PR)?
Yup, as you might recall from some discussions with @bhanugandham and @mwalker174, getting automated pipeline-level CNV evaluations up and running was highest on my list before I handed over the role and went on paternity leave. I think these tests would be more useful than unit/integration tests for correctness, but they would almost certainly have to run on CARROT. That said, the current level of unit/integration test coverage is a bit different from that for the somatic tools, because 1) it's difficult to run gCNV in any useful way on Travis infrastructure, and 2) we hadn't decided on a framework/convention for python unit tests at the time the production code went in (although I think @ldgauthier has added some python unit tests by now). So we currently only have plumbing WDL/integration tests on very small data for gCNV---and these only test that the tools run, not for correctness. For somatic CNV, we have unit tests for correctness on small simulated data (e.g., for things like segmentation and modeling classes), but integration tests don't cover correctness (and it would be pretty redundant to use the same simulated data for integration, so I'd rather put effort towards pipeline-level tests on real data). It might be good for you and @mwalker174 to review the current level of testing coverage and understand where things need to be shored up---happy to discuss more. |
…ut by gCNV and updated sklearn. (#7261)
* Added a new suite of tools for variant filtering based on site-level annotations. (#7954) * Adds wdl that tests joint VCF filtering tools (#7932) * adding filtering wdl * renaming pipeline * addressing comments * added bash * renaming json * adding glob to extract for extra files * changing dollar signs * small comments * Added changes for specifying model backend and other tweaks to WDLs and environment. * Added classes for representing a collection of labeled variant annotations. * Added interfaces for modeling and scoring backends. * Added a new suite of tools for variant filtering based on site-level annotations. * Added integration tests. * Added test resources and expected results. * Miscellaneous changes. * Removed non-ASCII characters. * Added documentation for TrainVariantAnnotationsModel and addressed review comments. Co-authored-by: meganshand <[email protected]> * Added toggle for selecting resource-matching strategies and miscellaneous minor fixes to new annotation-based filtering tools. (#8049) * Adding use_allele_specific_annotation arg and fixing task with empty input in JointVcfFiltering WDL (#8027) * Small changes to JointVCFFiltering WDL * making default for use_allele_specific_annotations * addressing comments * first stab * wire through WDL changes * fixed typo * set model_backend input value * add gatk_override to JointVcfFiltering call * typo in indel_annotations * make model_backend optional * tabs and spaces * make all model_backends optional * use gatk 4.3.0 * no point in changing the table names as this is a POC * adding new branch to dockstore * adding in branching logic for classic VQSR vs VQSR-Lite * implementing the separate schemas for the VQSR vs VQSR-Lite branches, including Java changes necessary to produce the different tsv files * passing classic flag to indel run of CreateFilteringFiles * Update GvsCreateFilterSet.wdl cleaning up verbiage * Removed mapping error rate from estimate of denoised copy ratios output by gCNV and updated sklearn. (#7261) * cleanup up sloppy comment --------- Co-authored-by: samuelklee <[email protected]> Co-authored-by: meganshand <[email protected]> Co-authored-by: Rebecca Asch <[email protected]>
@fleharty this is a rebased version of the 17cadfa branch used to generate the Pf7 CNV call set. There are two minor changes: a) one to remove spurious negative dCR estimates reported by gCNV, which were negatively affecting genotyping of HRP2/3 deletions, and b) updating sklearn to the version used for clustering, so that we can reproduce everything exactly using just the GATK Docker. The latter change probably isn't absolutely necessary, but it doesn't seem to break anything so I'm going to go ahead with it. We might want to update to an even more recent version later on (especially if we make any breaking/non-refactoring improvements to the malaria genotyping code after the initial PR), but unfortunately this slightly changes the clustering assignment for a few samples.
@mwalker174 @asmirnov239 we discussed the first change some time ago, but just a heads up.