Skip to content

Commit

Permalink
Introduced symbolic allele "<*>" in gVCF
Browse files Browse the repository at this point in the history
  • Loading branch information
pd3 committed Sep 23, 2014
1 parent af36fb1 commit 4a91745
Showing 1 changed file with 10 additions and 11 deletions.
21 changes: 10 additions & 11 deletions VCFv4.3.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1057,22 +1057,21 @@ \subsection{Representing unspecified alleles and REF-only blocks (gVCF)}
using the END INFO tag, an idea originally introduced by the gVCF file format\footnote{\url{https://support.basespace.illumina.com/knowledgebase/articles/147078-gvcf-file}}.
The convention adopted here is to represent reference evidence as likelihoods against an
unknown alternate allele. Think of this as the likelihood for reference as compared to any other possible alternate
allele (both SNP, indel, or otherwise). A symbolic alternate allele ($<$X$>$ in
samtools and $<$NON\_REF$>$ in GATK implementation) is used to represent this
unspecified alternate allele.
allele (both SNP, indel, or otherwise). A symbolic alternate allele $<$*$>$
is used to represent this unspecified alternate allele.

Example records are given below:
\scriptsize
\begin{flushleft}
\begin{tabular}{ l l l l l l l l l l }
\#CHROM & POS & ID & REF & ALT & QUAL & FILTER & INFO & FORMAT & Sample \\
1 & 4370 & . & G & $<$X$>$ & . & . & END=4383 & GT:DP:GQ:MIN\_DP:PL & 0/0:25:60:23:0,60,900 \\
1 & 4384 & . & C & $<$X$>$ & . & . & END=4388 & GT:DP:GQ:MIN\_DP:PL & 0/0:25:45:25:0,42,630 \\
1 & 4389 & . & T & TC,$<$X$>$ & 213.73 & . & . & GT:DP:GQ:PL & 0/1:23:99:51,0,36,93,92,86 \\
1 & 4390 & . & C & $<$X$>$ & . & . & END=4390 & GT:DP:GQ:MIN\_DP:PL & 0/0:26:0:26:0,0,315 \\
1 & 4391 & . & C & $<$X$>$ & . & . & END=4395 & GT:DP:GQ:MIN\_DP:PL & 0/0:27:63:27:0,63,945 \\
1 & 4396 & . & G & C,$<$X$>$ & 0 & . & . & GT:DP:GQ:P & 0/0:24:52:0,52,95,66,95,97 \\
1 & 4397 & . & T & $<$X$>$ & . & . & END=4416 & GT:DP:GQ:MIN\_DP:PL & 0/0:22:14:22:0,15,593 \\
1 & 4370 & . & G & $<$*$>$ & . & . & END=4383 & GT:DP:GQ:MIN\_DP:PL & 0/0:25:60:23:0,60,900 \\
1 & 4384 & . & C & $<$*$>$ & . & . & END=4388 & GT:DP:GQ:MIN\_DP:PL & 0/0:25:45:25:0,42,630 \\
1 & 4389 & . & T & TC,$<$*$>$ & 213.73 & . & . & GT:DP:GQ:PL & 0/1:23:99:51,0,36,93,92,86 \\
1 & 4390 & . & C & $<$*$>$ & . & . & END=4390 & GT:DP:GQ:MIN\_DP:PL & 0/0:26:0:26:0,0,315 \\
1 & 4391 & . & C & $<$*$>$ & . & . & END=4395 & GT:DP:GQ:MIN\_DP:PL & 0/0:27:63:27:0,63,945 \\
1 & 4396 & . & G & C,$<$*$>$ & 0 & . & . & GT:DP:GQ:P & 0/0:24:52:0,52,95,66,95,97 \\
1 & 4397 & . & T & $<$*$>$ & . & . & END=4416 & GT:DP:GQ:MIN\_DP:PL & 0/0:22:14:22:0,15,593 \\
\end{tabular}
\end{flushleft}
\normalsize
Expand Down Expand Up @@ -1688,7 +1687,7 @@ \subsection{Changes between VCFv4.2 and VCFv4.3}
\item In order for VCF and BCF to have the same expressive power, we state explicitly that Integers and Floats are 32-bit numbers. Integers are signed.
\item We state explicitly that zero length strings are not allowed, this includes the CHROM and ID column, INFO IDs, FILTER IDs and FORMAT IDs. Meta-information lines can be in any order, with the exception of \#\#fileformat which must come first. INFO, FILTER and FORMAT IDs must be unique within that type.
\item We state explicitly that duplicate IDs, FILTER, INFO or FORMAT keys are not valid.
\item A section about gVCF was added
\item A section about gVCF was added, introduced the $<$*$>$ symbolic allele.
\item A section about tag naming conventions was added
\end{itemize}

Expand Down

0 comments on commit 4a91745

Please sign in to comment.