This pipeline generates consensus SARS-CoV-2 genomes from fastq files. We are using it on the following types of sequencing data:
- Metagenomic sequencing enriched for SARS-CoV-2 reads (protocols.io).
- Amplicon-based short-read sequencing (using ARTIC v3 protocol).
For generating consensus genomes from reads:
nextflow run czbiohub/sc2-illumina-pipeline -profile artic,docker \
--reads '[s3://]path/to/reads/*_R{1,2}_001.fastq.gz*' \
--kraken2_db '[s3://]path/to/kraken2db' \
--outdir '[s3://]path/to/outdir'
The kraken2db can be downloaded from https://genexa.ch/sars2-bioinformatics-resources/.
Replace -profile artic
with -profile msspe
if using MSSPE
sequencing. See the documentation for more details.
Simple test to make sure things aren't broken:
nextflow run czbiohub/sc2-illumina-pipeline -profile docker,test
Simple benchmark (for mapping, not speed). Run after algorithm changes to see how accuracy might be affected. Result in benchmark/call_consensus-stats/combined.stats.tsv
nextflow run czbiohub/sc2-illumina-pipeline --profile docker,benchmark
The czbiohub/sc2-illumina-pipeline pipeline comes with documentation about the pipeline, found in the docs/
directory:
Initial version of this pipeline was based on https://github.com/connor-lab/ncov2019-artic-nf