You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GenotypeGVCFs with --keep-combined-raw-annotations
Affected version(s)
Latest public release version [version?]
Latest master branch as of (not tested)
Description
@ldgauthier was kind enough to introduce the --keep-combined-raw-annotations option for us after the discussion in issue #5698, and we've been using it extensively. We recently noticed a problem that affects a small fraction of variants though.
We're noticing this with AS_SB_TABLE but it probably applies to all annotations that are per-allele or per-alt allele. The problem is that when GenotypeGVCFs runs it may chose to output only a subset of the alleles present in the gVCF. When it does this it does not appear to update the annotations to remove the values for the removed alleles. This results in annotations with more values than there are alleles, and no safe/predictable way to interpret those annotations since you don't know the original ordering of alleles and which ones were removed when looking at the resulting VCF. This is happening, in my case, primarily at homopolymer sites and occasionally at STRs with larger repeat units.
I've attached a zip file - AS_SB_TABLE_bug.zip - which contains a one-record gVCF, the command to generate the VCF and the resulting VCF, which should be sufficient to demonstrate the problem and reproduce it.
Here's what an offending variant looks like:
chr1 100366446 . GTT G 562.64 . AC=1;AF=0.500;AN=2;AS_SB_TABLE=19,6|16,6|4,0|2,2|1,1;...;REF_BASES=ATGTTTTTTTGTTTTTTTTTT;RPA=13,11;RU=T;ReadPosRankSum=-1.296e+00;SOR=0.534;STR GT:AD:DP:F1R2:F2R1:GQ:PL 0/1:25,22:57:19,16:4,4:99:570,0,819
Steps to reproduce
See attached zip file.
Expected behavior
All per-allele and per-alt-allele annotations should be subsetted to only the values for the alleles that are output in the resulting VCF.
Actual behavior
All the values for all the input alleles come out.
The text was updated successfully, but these errors were encountered:
Bug Report
Affected tool(s) or class(es)
GenotypeGVCFs with --keep-combined-raw-annotations
Affected version(s)
Description
@ldgauthier was kind enough to introduce the
--keep-combined-raw-annotations
option for us after the discussion in issue #5698, and we've been using it extensively. We recently noticed a problem that affects a small fraction of variants though.We're noticing this with
AS_SB_TABLE
but it probably applies to all annotations that are per-allele or per-alt allele. The problem is that when GenotypeGVCFs runs it may chose to output only a subset of the alleles present in the gVCF. When it does this it does not appear to update the annotations to remove the values for the removed alleles. This results in annotations with more values than there are alleles, and no safe/predictable way to interpret those annotations since you don't know the original ordering of alleles and which ones were removed when looking at the resulting VCF. This is happening, in my case, primarily at homopolymer sites and occasionally at STRs with larger repeat units.I've attached a zip file - AS_SB_TABLE_bug.zip - which contains a one-record gVCF, the command to generate the VCF and the resulting VCF, which should be sufficient to demonstrate the problem and reproduce it.
Here's what an offending variant looks like:
Steps to reproduce
See attached zip file.
Expected behavior
All per-allele and per-alt-allele annotations should be subsetted to only the values for the alleles that are output in the resulting VCF.
Actual behavior
All the values for all the input alleles come out.
The text was updated successfully, but these errors were encountered: