Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCFv4.3 - first batch of changes #88

Merged
merged 35 commits into from
Oct 10, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
af41ed9
Added VCFv4.3 draft, changes marked in red color
pd3 Jul 28, 2014
2d3f041
Adding subsection about gVCF files.
Aug 1, 2014
4ca3424
Updated gVCF section, added list of changes
pd3 Aug 11, 2014
efed15f
Added a gVCF section plus minor edits
pd3 Sep 8, 2014
a09e56f
Forbid duplicate fields, UTF8 the only encoding.
pd3 Sep 19, 2014
af36fb1
Removed forgotten copy and paste error, resolves #41
pd3 Sep 23, 2014
4a91745
Introduced symbolic allele "<*>" in gVCF
pd3 Sep 23, 2014
7789a1b
Add ##META definitions
pd3 Oct 1, 2014
250a88c
Include META header lines in the list of changes
pd3 Oct 2, 2014
1080e8a
Clarification: no negative values in GT array except the end-of-vecto…
pd3 Oct 15, 2014
66d45d5
Clarified GL ordering. Resolves #58
pd3 Jan 5, 2015
ce805dd
Merge branch 'master' into VCFv4.3
pd3 Jan 26, 2015
aabe553
Fix #82
pd3 Apr 30, 2015
61d0f01
Fix #82
pd3 Apr 30, 2015
672b512
Fix SVLEN example, resolves #84
pd3 May 19, 2015
73482c0
Fix SVLEN examplein VCFv4.3, resolves #84
pd3 May 19, 2015
8f6b7cf
Update minor version number, resolves #63
pd3 May 19, 2015
f247cee
List of changes:
pd3 Jun 2, 2015
d6b46df
GL ordering made clearer as discussed in #83
pd3 Jun 3, 2015
08506d8
Be definitive in the ##fileformat section
jmarshall Jun 11, 2015
1c5e278
Removed ill-defined GLE tag, resolves #90
pd3 Jun 22, 2015
db372bc
Data Types section, describe valid float format, NaN and +/-Inf are a…
pd3 Jun 22, 2015
c128680
Extended bits about character encoding
pd3 Jun 29, 2015
c50589b
Chromosome names cannot use reserved symbolic alleles and contain cha…
pd3 Jun 29, 2015
4f49af0
Reference SV INFO and FORMAT section from the main text
pd3 Jun 29, 2015
6cc5604
IUPAC ambiguity codes, as per #54
pd3 Jun 29, 2015
4b381fa
Fixed the wording of trailing tabs and IUPAC codes from "should" to "…
pd3 Jun 30, 2015
31bd4cc
Remove repeated text about INFO fields for imprecise structural varia…
bjpop Jul 14, 2015
51b4cb6
PEDIGREE VCF header lines now require ID tag, resolves #96
pd3 Jul 29, 2015
56e8bae
Clarifications made as requested by https://github.com/samtools/hts-s…
pd3 Aug 24, 2015
187745f
Fix inconsistent description of PASS in BCF.
pd3 Sep 9, 2015
d3a28e2
Removed red coloring from the text, VCFv4.3 is ready to be merged
pd3 Sep 24, 2015
861e235
Modified example, coordinates must be ordered
pd3 Oct 10, 2015
ad14194
Move sentences which apply to all data lines from 1.6.1 to 1.6
pd3 Oct 10, 2015
fe0222c
Stronger wording about disallowed INT32 values
pd3 Oct 10, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,16 @@ PDFS = BCFv1_qref.pdf \
SAMv1.pdf \
tabix.pdf \
VCFv4.1.pdf \
VCFv4.2.pdf
VCFv4.2.pdf \
VCFv4.3.pdf

pdf: $(PDFS)

CRAMv2.1.pdf: CRAMv2.1.tex CRAMv2.1.ver
SAMv1.pdf: SAMv1.tex SAMv1.ver
VCFv4.1.pdf: VCFv4.1.tex VCFv4.1.ver
VCFv4.2.pdf: VCFv4.2.tex VCFv4.2.ver
VCFv4.3.pdf: VCFv4.3.tex VCFv4.3.ver


.SUFFIXES: .tex .pdf .ver
Expand Down
6 changes: 3 additions & 3 deletions VCFv4.1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ \section{Understanding the VCF format and the haplotype representation}
\section{INFO keys used for structural variants}
When the INFO keys reserved for encoding structural variants are used for imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles).

The following INFO keys are reserved for encoding structural variants. In general, when these keys are used by imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles).
The following INFO keys are reserved for encoding structural variants.
\footnotesize
\begin{verbatim}
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
Expand Down Expand Up @@ -261,7 +261,7 @@ \section{INFO keys used for structural variants}
##INFO=<ID=BKPTID,Number=.,Type=String,Description="ID of the assembled alternate allele in the assembly file">
\end{verbatim}
\normalsize
For precise variants, the consensus sequence the alternate allele assembly is derivable from the REF and ALT fields. However, the alternate allele assembly file may contain additional information about the characteristics of the alt allele contigs.
For precise variants, the consensus sequence of the alternate allele assembly is derivable from the REF and ALT fields. However, the alternate allele assembly file may contain additional information about the characteristics of the alt allele contigs.
\footnotesize
\begin{verbatim}
##INFO=<ID=MEINFO,Number=4,Type=String,Description="Mobile element info of the form NAME,START,END,POLARITY">
Expand Down Expand Up @@ -498,7 +498,7 @@ \subsection{Encoding Structural Variants}
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
##FORMAT=<ID=CNQ,Number=1,Type=Float,Description="Copy number genotype quality for imprecise events">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
1 2827694 rs2376870 CGTGGATGCGGGGAC C . PASS SVTYPE=DEL;END=2827762;HOMLEN=1;HOMSEQ=G;SVLEN=-68 GT:GQ 1/1:13.9
1 2827694 rs2376870 CGTGGATGCGGGGAC C . PASS SVTYPE=DEL;END=2827708;HOMLEN=1;HOMSEQ=G;SVLEN=-14 GT:GQ 1/1:13.9
2 321682 . T <DEL> 6 PASS SVTYPE=DEL;END=321887;SVLEN=-205;CIPOS=-56,20;CIEND=-10,62 GT:GQ 0/1:12
2 14477084 . C <DEL:ME:ALU> 12 PASS SVTYPE=DEL;END=14477381;SVLEN=-297;CIPOS=-22,18;CIEND=-12,32 GT:GQ 0/1:12
3 9425916 . C <INS:ME:L1> 23 PASS SVTYPE=INS;END=9425916;SVLEN=6027;CIPOS=-16,22 GT:GQ 1/1:15
Expand Down
4 changes: 2 additions & 2 deletions VCFv4.2.tex
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,7 @@ \subsection{Encoding Structural Variants}
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
##FORMAT=<ID=CNQ,Number=1,Type=Float,Description="Copy number genotype quality for imprecise events">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
1 2827694 rs2376870 CGTGGATGCGGGGAC C . PASS SVTYPE=DEL;END=2827762;HOMLEN=1;HOMSEQ=G;SVLEN=-68 GT:GQ 1/1:13.9
1 2827694 rs2376870 CGTGGATGCGGGGAC C . PASS SVTYPE=DEL;END=2827708;HOMLEN=1;HOMSEQ=G;SVLEN=-14 GT:GQ 1/1:13.9
2 321682 . T <DEL> 6 PASS SVTYPE=DEL;END=321887;SVLEN=-205;CIPOS=-56,20;CIEND=-10,62 GT:GQ 0/1:12
2 14477084 . C <DEL:ME:ALU> 12 PASS SVTYPE=DEL;END=14477381;SVLEN=-297;CIPOS=-22,18;CIEND=-12,32 GT:GQ 0/1:12
3 9425916 . C <INS:ME:L1> 23 PASS SVTYPE=INS;END=9425916;SVLEN=6027;CIPOS=-16,22 GT:GQ 1/1:15
Expand All @@ -529,7 +529,7 @@ \subsection{Encoding Structural Variants}
The example shows in order:
\begin{enumerate}
\item A precise deletion with known breakpoint, a one base micro-homology, and a sample that is homozygous for the deletion.
\item An imprecise deletion of approximately 105 bp.
\item An imprecise deletion of approximately 205 bp.
\item An imprecise deletion of an ALU element relative to the reference.
\item An imprecise insertion of an L1 element relative to the reference.
\item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
Expand Down
Loading