-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating splice site logic. #5106
Updating splice site logic. #5106
Conversation
Now ignores leading indel bases when checking if variants are within the splice site boundaries (i.e. if a leading base in an indel, which is preserved between the reference and alternate alleles, is within the splice site boundary but the bases that have been changed are NOT, then the variant is now correctly labeled as NOT a splice site). Fixes #5050
@@ -1320,9 +1347,16 @@ private static GencodeGtfExonFeature getExonWithinSpliceSiteWindow( final Varian | |||
final int spliceSiteVariantWindowBases ) { | |||
GencodeGtfExonFeature spliceSiteExon = null; | |||
|
|||
final int varStart = FuncotatorUtils.getIndelAdjustedAlleleChangeStartPosition(variant); | |||
final int varEnd = variant.getEnd(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you repeat the logic above to set the end here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right - I need to update it.
I missed it because I was not using intervals to do the boundary checks.
Codecov Report
@@ Coverage Diff @@
## master #5106 +/- ##
=============================================
- Coverage 86.49% 86.49% -<.001%
- Complexity 29203 29262 +59
=============================================
Files 1814 1814
Lines 135364 135628 +264
Branches 15042 15068 +26
=============================================
+ Hits 117077 117305 +228
- Misses 12826 12854 +28
- Partials 5461 5469 +8
|
if ((Math.abs(exon.getStart() - variant.getStart()) <= spliceSiteVariantWindowBases) || | ||
(Math.abs(exon.getEnd() - variant.getStart()) <= spliceSiteVariantWindowBases)) { | ||
// Check the start and end of the variant to see if it overlaps with either end of the exon: | ||
if ((Math.abs(exon.getStart() - varStart) <= spliceSiteVariantWindowBases) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use something like SimpleInterval.overlapsWithMargin()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed!
Added more test cases and adjusted algorithm to account for inserted bases when dealing with insertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonn-smith One comment that might be no work. Feel free to merge once addressed.
// NOTE: because there could be degenerate VCF files that have more than one leading base overlapping, we need | ||
// to detect how many leading bases there are that overlap, rather than assuming there is only one. | ||
final int varStart; | ||
if ( GATKProtectedVariantContextUtils.typeOfVariant(variant.getReference(), altAllele).equals(VariantContext.Type.INDEL) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also check that it is not a complex indel (GATKProtectedVariantContextUtils.isComplexIndel(...)
), unless that is addressed upstream...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Sounds good. Fixed!
Now ignores leading indel bases when checking if variants are within the
splice site boundaries (i.e. if a leading base in an indel, which is
preserved between the reference and alternate alleles, is within the
splice site boundary but the bases that have been changed are NOT,
then the variant is now correctly labeled as NOT a splice site).
Fixes #5050