Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M2 doesn't use very short stubs of clipped reads for genotyping #5057

Merged
merged 3 commits into from
Jul 30, 2018

Conversation

davidbenjamin
Copy link
Contributor

@davidbenjamin davidbenjamin commented Jul 27, 2018

Closes #5060.

@meganshand This fixes your bug. Do you have time to review before Monday's release?

@takutosato @LeeTL1220 It improves sensitivity and specificity.

@ldgauthier This probably affects HaplotypeCaller as well.

@@ -60,6 +60,9 @@
public static final int MAX_NORMAL_QUAL_SUM = 100;
public static final int MIN_PALINDROME_SIZE = 5;

// after trimming to fit the assembly window, throw away read stubs shorter than this length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add why you do this and cite the issue in github?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@meganshand
Copy link
Contributor

@davidbenjamin This filter looks fine to me, but could you add a test? You should have the data from the mitochondria bug already, right? Also if this potentially affects HaplotypeCaller should this fix be somewhere deeper so the filter is used by both tools or would we want the fix to be different?

@davidbenjamin
Copy link
Contributor Author

It probably makes sense to fix HaplotypeCaller as well eventually, but the time scale for changes to HC is much slower and M2 can't afford to wait because, as it turns out, the bug you found doesn't just appear in mitochondria and is hurting sensitivity in the evaluation for our paper.

I will write a test.

@codecov-io
Copy link

codecov-io commented Jul 27, 2018

Codecov Report

Merging #5057 into master will increase coverage by 0.052%.
The diff coverage is 100%.

@@              Coverage Diff               @@
##             master     #5057       +/-   ##
==============================================
+ Coverage     86.35%   86.402%   +0.052%     
- Complexity    28824     28895       +71     
==============================================
  Files          1791      1791               
  Lines        133601    133789      +188     
  Branches      14920     14942       +22     
==============================================
+ Hits         115364    115596      +232     
+ Misses        12834     12794       -40     
+ Partials       5403      5399        -4
Impacted Files Coverage Δ Complexity Δ
...hellbender/tools/walkers/mutect/Mutect2Engine.java 91.391% <100%> (+0.175%) 55 <2> (+2) ⬆️
...r/tools/walkers/mutect/Mutect2IntegrationTest.java 91.166% <100%> (+0.796%) 56 <2> (+3) ⬆️
...utils/smithwaterman/SmithWatermanIntelAligner.java 50% <0%> (-30%) 1% <0%> (-2%)
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java 60% <0%> (ø) 2% <0%> (ø) ⬇️
...roadinstitute/hellbender/utils/read/ReadUtils.java 80.516% <0%> (+0.469%) 204% <0%> (ø) ⬇️
...ols/walkers/haplotypecaller/AssemblyResultSet.java 75.449% <0%> (+1.198%) 44% <0%> (+1%) ⬆️
...walkers/bqsr/AnalyzeCovariatesIntegrationTest.java 93.902% <0%> (+1.839%) 40% <0%> (+18%) ⬆️
...ls/genomicsdb/GenomicsDBImportIntegrationTest.java 93.273% <0%> (+2.341%) 108% <0%> (+35%) ⬆️
...pecaller/readthreading/ReadThreadingAssembler.java 68.498% <0%> (+2.564%) 52% <0%> (+1%) ⬆️
... and 9 more

@davidbenjamin
Copy link
Contributor Author

Back to @meganshand. I put in a simple mitochondrial integration test. Given that our MC3 validation already covers this particular bug I actually don't think it needs a new test for mitochondria. Also, for later, are any of your spike-in bams public (or rather, public + public)? I noticed that the NA12878 truth doesn't have very low AFs.

@meganshand
Copy link
Contributor

@davidbenjamin Thanks for the test! Unfortunately none of the spike-in bams I have are public, but I will ask Sarah Calvo if she knows of any samples that would work as a spike-in and are public. Maybe we can track one down.

Copy link
Contributor

@meganshand meganshand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@davidbenjamin davidbenjamin merged commit b6a630a into master Jul 30, 2018
@davidbenjamin davidbenjamin deleted the db_read_stubs branch July 30, 2018 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

M2 getting thrown off by clipped read stubs
4 participants