-
Notifications
You must be signed in to change notification settings - Fork 18
Output files and formats
Brian Haas edited this page Oct 11, 2023
·
14 revisions
The primary output files generated by the pipeline include the following:
- ${sample_name}.vcf : the initially predicted variants
- ${sample_name}.filtered.vcf : variants after applying hard cutoffs to remove likely false positives. The hard cutoffs applied via 'GATK VariantFiltration' are: " -window 35 -cluster 3 -filter FS > 30 -filter QD < 2.0 -filter SPLICEADJ < 3 "
- ${sample_name}.boosting.${method}.vcf: if a boosting method is set, the boosted variants are annotated as BOOSTselect=${method} in the vcf. Boosting is provided as an alternative to applying the hard cutoffs above.
- cancer.vcf : the subset of variants that are considered most relevant to cancer biology. These are selected based on the variant annotations requiring: gnomad AF < 0.01 and (chasmplus_pval or vest_pval < 0.05, FATHMM in ["CANCER", "PATHOGENIC"], or clinvar_sig =~ /pathogenic/i )
- igvjs_viewer.html : self-contained web-application for interactively navigating the cancer variants.
The variant annotations and descriptions include:
Column | Description |
---|---|
CHROM | Chromosome |
POS | The 1-based position of the variation on the given sequence. |
REF | Base(s) at position in the reference genome (hg38) |
ALT | Alternate base(s) |
GENE | The name of the gene/s in the genomic region of the SNP as annotated by SNPeff |
QUAL | A quality score associated with the inference of the given alleles. |
MQ | RMS mapping quality |
RNAEDIT | A known or predicted RNA-editing site (from Rediportal) |
RPT | Repeat family from UCSC Genome Browser Repeatmasker Annotations |
DJ | Variant is within specified distance of a reference exon splice boundary |
FATHMM | FATHMM (Functional Analysis through Hidden Markov Models). 'Pathogenic':Cancer or damaging 'Neutral':Passanger or Tolerated. |
chasm_pval | Empirical p-value (probability that passenger variant is misclassified as a driver). from OpenCravat |
vest_pval | Empirical p-value (probability that benign variant is misclassified as pathogenic). from OpenCravat |
mupit_link | MuPIT 3D structure variant link |
RS | dbSNP ID (i.e. rs number) |
gnomas_RS | gnomad variant identifier |
gnomad_AF | Allele Frequency for each ALT allele in the same order as listed |
ANN | SnpEff annotations |
Homopolymer | Variant is located in or near a homopolymer sequence |