This document illustrates the individual steps of the Computational Protocol for Assembly and Analysis of SARS-nCoV-2 Genomes. Please see the README document for installation instructions.
fastqc <fastq file>
If the quality of the reads are not satisfactory, use Trimmomatic to remove bad quality reads and adapters. For unpaired reads:
trimmomatic SE <input.fastq> <output.fastq> ILLUMINACLIP:~anaconda3/envs/covid19-genepi/share/trimmomatic/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:30 MINLEN:100
For paired reads:
trimmomatic PE <input_fwd.fastq> <input_rev.fastq> <output_fwd_paired.fastq> <output_fwd_unpaired.fastq> <output_rev_paired.fastq> <output_rev_unpaired.fastq> ILLUMINACLIP:~anaconda3/envs/covid19-genepi/share/trimmomatic/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:30 MINLEN:100
For unpaired reads:
kraken2 --db minikraken_8GB_20200312 --threads 16 --report <file.kreport> <trimmed.fastq> > <trimmed.kraken>
ktImportTaxonomy -s 3 -t 4 -o <visualization_output>.html <trimmed.kraken>
For paired reads:
ktImportTaxonomy -s 3 -t 4 -o <visualization_output>.html <trimmed.kraken>
For unpaired reads,
hisat2 -x <Humangenomeprefix> -U <unpaired_input.fastq> -S <human.sam> -p 16 --dta-cufflinks --summary-file <humanUnpaired.log>
For paired reads,
hisat2 -x <Humangenomeprefix> -1 <file1.fastq> -2 <file2.fastq> -S <human.sam> -p 16 --dta-cufflinks --summary-file <file_human_paired.log>
samtools sort -@ 8 <human.sam> -o <human.bam> -O BAM
samtools view -u -f 4 -F264 <human.bam> > temp1.bam
samtools view -u -f 8 -F 260 <human.bam> > temp2.bam
samtools view -u -f 12 -F 256 <human.bam> > temp3.bam
samtools merge -u - temp[123].bam | samtools sort -n -o <unmapped.bam>
For unpaired reads,
bamToFastq -i <unmapped.bam> -fq <unmapped.fq>
For paired reads,
bamToFastq -i <unmapped.bam> -fq <unmapped_1.fq> -fq2 <unmapped_2.fq>
For unpaired reads,
hisat2 -x <NC_045512.2_viral_reference_genome> -U <unmapped.fq> -S <covid.sam> -p 16 --dta-cufflinks --summary-file <unmapped.log>
For paired reads,
hisat2 -x <NC_045512.2_viral_reference_genome> -1 <unmapped_1.fq> -2 <unmapped_2.fq> -S <covid.sam> -p 16 --dta-cufflinks --summary-file <unmapped.log>
samtools flagstat <covid.sam>
samtools sort -@ 8 <covid.sam> -o <covid.bam> -O BAM
picard CollectMultipleMetrics I=<covid.bam> O=<covid> R=<NC_045512.2_viral_reference_genome>
samtools faidx <covid_reference_genome>
samtools mpileup -uf <covid_reference_genome> <covid.bam> | bcftools call -c | vcfutils.pl vcf2fq > <consensus.fq>
seqtk seq -aQ64 -q20 -n N <consensus.fq> > <consensus.fasta>
samtools mpileup -uf <covid_reference_genome> <covid.bam> | bcftools call -cv -Ob > <variant.bcf>
bcftools view <variant.bcf> > <variant.vcf>
samtools mpileup -f <covid_reference_genome> <covid.bam> > <covid.pileup>
varscan mpileup2cns <covid.pileup> --output-vcf 1 --variants > <covid.vcf>
megahit -r <unmapped.fq> -o <denovo_covid_genome>
spades.py -t 8 -s <unmapped.fq> -o <spades_output_folder>
quast <contigs.fasta> -r <NC_045512.2.fna covid reference genome> -o <quast_output>
mafft --thread 4 <gisaid_cov2020_sequences.fasta> > <gisaid_cov2020_sequences_MSA.fasta>
megacc -a <infer_NJ_nucleotide.mao> -d <gisaid_cov2020_sequences_MSA.fasta> -o <gisaid_cov2020_tree>