Skip to content

Output files and formats

Brian Haas edited this page Oct 11, 2023 · 14 revisions

CTAT-mutations Output Files and Formats

The primary output files generated by the pipeline include the following:

  • ${sample_name}.vcf : the initially predicted variants
  • ${sample_name}.filtered.vcf : variants after applying hard cutoffs to remove likely false positives. The hard cutoffs applied via 'GATK VariantFiltration' are: " -window 35 -cluster 3 -filter FS > 30 -filter QD < 2.0 -filter SPLICEADJ < 3 "
  • ${sample_name}.boosting.${method}.vcf: if a boosting method is set, the boosted variants are annotated as BOOSTselect=${method} in the vcf. Boosting is provided as an alternative to applying the hard cutoffs above.
  • cancer.vcf : the subset of variants that are considered most relevant to cancer biology. These are selected based on the variant annotations requiring: gnomad AF < 0.01 and (chasmplus_pval or vest_pval < 0.05, FATHMM in ["CANCER", "PATHOGENIC"], or clinvar_sig =~ /pathogenic/i )
  • igvjs_viewer.html : self-contained web-application for interactively navigating the cancer variants.

The variant annotations and descriptions include:

Column Description
CHROM Chromosome
POS The 1-based position of the variation on the given sequence.
REF Base(s) at position in the reference genome (hg38)
ALT Alternate base(s)
GENE The name of the gene/s in the genomic region of the SNP as annotated by SNPeff
QUAL A quality score associated with the inference of the given alleles.
MQ RMS mapping quality
RNAEDIT A known or predicted RNA-editing site (from Rediportal)
RPT Repeat family from UCSC Genome Browser Repeatmasker Annotations
DJ Variant is within specified distance of a reference exon splice boundary
FATHMM FATHMM (Functional Analysis through Hidden Markov Models). 'Pathogenic':Cancer or damaging 'Neutral':Passanger or Tolerated.
chasm_pval Empirical p-value (probability that passenger variant is misclassified as a driver). from OpenCravat
vest_pval Empirical p-value (probability that benign variant is misclassified as pathogenic). from OpenCravat
mupit_link MuPIT 3D structure variant link
RS dbSNP ID (i.e. rs number)
gnomas_RS gnomad variant identifier
gnomad_AF Allele Frequency for each ALT allele in the same order as listed
ANN SnpEff annotations
Homopolymer Variant is located in or near a homopolymer sequence