Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReferenceConfidenceVariantContextMerger fixes for spanning deletions, attribute merging. #4680

Merged
merged 1 commit into from
Apr 23, 2018

Conversation

cmnbroad
Copy link
Collaborator

Fixes for #4525 and #4633. Longer term we should do a more significant rewrite/refactoring.

return Integer.parseInt(stringValue);
// Use the VCF header's declared type for the given attribute to ensure that all the values for that attribute
// across all the VCs being merged have the same boxed representation. Some VCs have a serialized value of "0"
// for FLOAT attributes, with no embedded decimal point, but we still need to box those into Doubles, or the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of all the terrible things that happen to annotations in htsjdk, at least it guarantees some decimal precision (VCFEncoder.formatVCFDouble()). I'm guessing this was an issue related to GenomicsDB output (which uses htslib)? I mentioned this to Karthik a while ago (#4047 (comment)) It should have been addressed by #4261, but I haven't tested that explicitly. Not that I object to the belt and suspenders approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldgauthier Good to know - thanks. At least two users have reported this, one of them seemed to implicate bcftools, but that may be a red herring. I do think htsjdk can produce such output though, since it appears to rely on the in-memory representation of the value provided by the caller, rather than the header, to determine the format. I’d love to tighten that up in the next round of htsjdk updates.

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good and I have really only one sticking issue that probably should be fixed.

@@ -174,6 +180,9 @@ public VariantContext merge(final List<VariantContext> vcs, final Locatable loc,
for (final Allele a : vc.getAlternateAlleles()) {
if (a.isSymbolic()) {
result.add(a);
} else if ( a == Allele.SPAN_DEL ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really gross but it looks like the best we can do. It really seems like a bug to me that an Allele.SPAN_DEL is not symbolic but i guess thats an HTSJDK issue isn't it.

Copy link
Collaborator Author

@cmnbroad cmnbroad Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it really does seem like it should be symbolic. And there is a proposal/discussion about that here. Not sure how disruptive that change would be, but for now, this matches GATK3.

private Comparable<?> parseNumericInfoAttributeValue(final VCFHeader vcfHeader, final String key, final String stringValue) {
final VCFInfoHeaderLine infoLine = vcfHeader.getInfoHeaderLine(key);
if (infoLine == null) {
warning.warn(String.format("No header line for INFO attribute %s was found in the input header", key));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning seems problematic. If it gets triggered it will probably get spit out for EVERY read in a VCF which would result in a torrential span. Perhaps us a OneShotLogger here to save peoples poor poor bash histories...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code leading up to this I don't expect this warning will be a problem for any non-reducible annotations except i'm not sure i completely trust our VCFStandardHeaderLines to have the correct types. We have a number of issues with the header lines either here or in HTSJDK being wrong. This is still better I would say.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point - but warning is a OneShotLogger. Also, I did keep the fallback code for the old behavior if the header line isn't found.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point about warning. The next point is that currently its just a dumb implementation that only emits once. If there are multiple error states in a file its probably best to open multiple loggers so if both are triggered they will each be written out, warning is currently used for a different case to do with merging and numerical parsing. maybe split them into 2 loggers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Split out into a second logger, with a slightly more generic message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I rebased this on top of my master to pick up my last CombineGVCFs branch/fix.

@codecov-io
Copy link

codecov-io commented Apr 20, 2018

Codecov Report

Merging #4680 into master will increase coverage by 0.004%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #4680       +/-   ##
===============================================
+ Coverage     79.842%   79.847%   +0.004%     
- Complexity     17332     17334        +2     
===============================================
  Files           1074      1074               
  Lines          62924     62938       +14     
  Branches       10182     10186        +4     
===============================================
+ Hits           50240     50254       +14     
  Misses          8704      8704               
  Partials        3980      3980
Impacted Files Coverage Δ Complexity Δ
...lkers/ReferenceConfidenceVariantContextMerger.java 95.257% <100%> (+0.278%) 71 <3> (+2) ⬆️
...titute/hellbender/tools/walkers/GenotypeGVCFs.java 90.083% <100%> (ø) 47 <0> (ø) ⬇️
...stitute/hellbender/tools/walkers/CombineGVCFs.java 93.75% <100%> (ø) 60 <0> (ø) ⬇️

@cmnbroad
Copy link
Collaborator Author

@jamesemery I think this is ready - not sure if you want to take one more quick look at it.

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo you should probably fix and a variable that should have a better name but after that you can merge 👍

private Comparable<?> parseNumericInfoAttributeValue(final VCFHeader vcfHeader, final String key, final String stringValue) {
final VCFInfoHeaderLine infoLine = vcfHeader.getInfoHeaderLine(key);
if (infoLine == null) {
oneShotHeaderLineLogger.warn(String.format("At least one attribute was found (%s) for which there no corresponding header line", key));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there no -> there is no

protected final VariantAnnotatorEngine annotatorEngine;
protected final OneShotLogger warning = new OneShotLogger(this.getClass());
protected final OneShotLogger oneShotHeaderLineLogger = new OneShotLogger(this.getClass());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, could you rename warning to something like nonNumericalAttributeWarning or something to that end so its clearer in the future.

@jamesemery jamesemery assigned cmnbroad and unassigned jamesemery Apr 23, 2018
@cmnbroad cmnbroad force-pushed the cn_reference_confidence_merger branch from 3db1e4c to 7ba0a9f Compare April 23, 2018 16:53
@cmnbroad cmnbroad merged commit e53241b into master Apr 23, 2018
@cmnbroad cmnbroad deleted the cn_reference_confidence_merger branch April 23, 2018 20:12
cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants