Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating splice site logic. #5106

Merged
merged 3 commits into from
Aug 15, 2018

Conversation

jonn-smith
Copy link
Collaborator

Now ignores leading indel bases when checking if variants are within the
splice site boundaries (i.e. if a leading base in an indel, which is
preserved between the reference and alternate alleles, is within the
splice site boundary but the bases that have been changed are NOT,
then the variant is now correctly labeled as NOT a splice site).

Fixes #5050

Now ignores leading indel bases when checking if variants are within the
splice site boundaries (i.e. if a leading base in an indel, which is
    preserved between the reference and alternate alleles, is within the
    splice site boundary but the bases that have been changed are NOT,
    then the variant is now correctly labeled as NOT a splice site).

Fixes #5050
@@ -1320,9 +1347,16 @@ private static GencodeGtfExonFeature getExonWithinSpliceSiteWindow( final Varian
final int spliceSiteVariantWindowBases ) {
GencodeGtfExonFeature spliceSiteExon = null;

final int varStart = FuncotatorUtils.getIndelAdjustedAlleleChangeStartPosition(variant);
final int varEnd = variant.getEnd();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you repeat the logic above to set the end here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right - I need to update it.

I missed it because I was not using intervals to do the boundary checks.

@codecov-io
Copy link

codecov-io commented Aug 14, 2018

Codecov Report

Merging #5106 into master will decrease coverage by <.001%.
The diff coverage is 98.611%.

@@              Coverage Diff              @@
##             master    #5106       +/-   ##
=============================================
- Coverage     86.49%   86.49%   -<.001%     
- Complexity    29203    29262       +59     
=============================================
  Files          1814     1814               
  Lines        135364   135628      +264     
  Branches      15042    15068       +26     
=============================================
+ Hits         117077   117305      +228     
- Misses        12826    12854       +28     
- Partials       5461     5469        +8
Impacted Files Coverage Δ Complexity Δ
...nder/tools/funcotator/FuncotatorUtilsUnitTest.java 90.014% <100%> (+0.313%) 63 <2> (+2) ⬆️
...Sources/gencode/DataProviderForMuc16IndelData.java 94.118% <100%> (+14.118%) 1 <0> (ø) ⬇️
...Sources/gencode/DataProviderForPik3caTestData.java 98.684% <100%> (ø) 3 <0> (ø) ⬇️
...dataSources/gencode/GencodeFuncotationFactory.java 83.871% <100%> (+0.487%) 166 <6> (+5) ⬆️
...e/hellbender/tools/funcotator/FuncotatorUtils.java 80.485% <90%> (+0.539%) 175 <6> (+7) ⬆️
...ats/collections/SimpleCountCollectionUnitTest.java 83.784% <0%> (-5.105%) 4% <0%> (+1%)
...stitute/hellbender/tools/spark/ApplyBQSRSpark.java 100% <0%> (ø) 6% <0%> (+3%) ⬆️
...ber/formats/collections/SimpleCountCollection.java 100% <0%> (ø) 12% <0%> (+1%) ⬆️
...institute/hellbender/utils/io/IOUtilsUnitTest.java 86.458% <0%> (+0.127%) 32% <0%> (+8%) ⬆️
... and 1 more

if ((Math.abs(exon.getStart() - variant.getStart()) <= spliceSiteVariantWindowBases) ||
(Math.abs(exon.getEnd() - variant.getStart()) <= spliceSiteVariantWindowBases)) {
// Check the start and end of the variant to see if it overlaps with either end of the exon:
if ((Math.abs(exon.getStart() - varStart) <= spliceSiteVariantWindowBases) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use something like SimpleInterval.overlapsWithMargin() here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed!

Added more test cases and adjusted algorithm to account for inserted
bases when dealing with insertions.
Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonn-smith One comment that might be no work. Feel free to merge once addressed.

// NOTE: because there could be degenerate VCF files that have more than one leading base overlapping, we need
// to detect how many leading bases there are that overlap, rather than assuming there is only one.
final int varStart;
if ( GATKProtectedVariantContextUtils.typeOfVariant(variant.getReference(), altAllele).equals(VariantContext.Type.INDEL) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also check that it is not a complex indel (GATKProtectedVariantContextUtils.isComplexIndel(...)), unless that is addressed upstream...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Sounds good. Fixed!

@jonn-smith jonn-smith merged commit f397a0b into master Aug 15, 2018
@jonn-smith jonn-smith deleted the jts_funcotator_splice_site_indel_boundary_fix_5050 branch August 15, 2018 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants