-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added GVCF mode for VariantContext type determination #1544
Added GVCF mode for VariantContext type determination #1544
Conversation
- Usually, NON_REF alleles will be considered SYMBOLIC. Therefore, if a VariantContext contains the alleles `A*,C,<NON_REF>`, the resulting type would be MIXED. For GVCF files, however, it would be helpful that this would be considered a SNP. - Default behavior will not change, only if true is passed for the optional ignoreNonRef argument to getType() - Added unit tests
@ldgauthier regarding broadinstitute/gatk#7111 I implemented the option to ignore NON_REF for type determination here, by passing an argument |
The * allele isn't specific to GVCFs, so I have to think a little more about the name. My first choice for implementation would be to add Allele::wouldBeStarAllele as another OR on Allele::wouldBeSymbolicAllele, but that would probably be a BREAKING CHANGE!!! We could try a GATK branch with the change and see what happens to the integration tests. |
Right, thanks for your input. If I understand your suggestion correctly, this would be independent of this PR though, as with the change that you proposed we'd still have to treat NON_REF different to other symbolic alleles for correct GVCF variant type determination. So I would suggest merging this PR either way (this would be a non-breaking change) and dealing with the * allele separately, unless anyone disagrees. |
- This was necessary because the type caching needs to distinguish between ignoreNonRef being true or false - Changed return type of `determineType` and `determinePolymorphicType` from `void` to `VariantContext.Type`, otherwise multiple code branches would be necessary depending on which caching variable to set - Added unit test to catch if the cache separation works
So is your proposal to keep the name |
My proposal would to keep |
The argument to GATK SelectVariants would be called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelgatzen This seems sane, I'm not sure if the boolean input value overload is better/worse than a new named method.
I have a question about the return value for only non-refs as well.
src/test/java/htsjdk/variant/variantcontext/VariantContextUnitTest.java
Outdated
Show resolved
Hide resolved
- Fixed bug when the type determination would output SYMBOLIC when it should output NO_VARIATION - Moved new tests ignoring the nonRef allele to a different method - Included tests that check if expectedType(alleles) == expectedTypeIgnoringNonRef(alleles + nonRef)
Codecov Report
@@ Coverage Diff @@
## master #1544 +/- ##
===============================================
+ Coverage 69.417% 69.842% +0.425%
- Complexity 8939 9638 +699
===============================================
Files 604 702 +98
Lines 35631 37619 +1988
Branches 5921 6111 +190
===============================================
+ Hits 24734 26274 +1540
- Misses 8549 8897 +348
- Partials 2348 2448 +100
|
@@ -108,109 +108,190 @@ public void testDetermineTypes() { | |||
|
|||
// test REF | |||
List<Allele> alleles = Arrays.asList(Tref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I figured this could be done programmatically with a dataprovider instead of needing quite so much boilerplate. Sorry for so much typing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wasn't sure if we use data providers in htsjdk, so I just went ahead and did this, but it wasn't much effort anyway. If you want me to change it I can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, we do. This test didn't for some reason, so I can see why it's confusing.
Let me know if there are any more comments, otherwise from my side everything would be good to be merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Related to broadinstitute/gatk#7111
A*,C,<NON_REF>
, the resulting type would be MIXED. For GVCF files, however, it would be helpful that this would be considered a SNP. When true is passed as the optional argumentignoreNonRef
toVariantContext.getType()
NON_REF alleles will be ignored for type determination. If only NON_REF alleles are seen at a given site, the type will remain to be SYMBOLIC.true
is passed for the optionalignoreNonRef
argument toVariantContext.getType()
determineType
anddeterminePolymorphicType
fromvoid
toVariantContext.Type
, otherwise multiple code branches would be necessary depending on which caching variable to set