Skip to content

Computational Protocol for Assembly and Analysis of SARS-nCoV-2 Genomes

Notifications You must be signed in to change notification settings

banijolly/Genepi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

The Computational Protocol for Assembly and Analysis of SARS-nCoV-2 Genomes has been compiled by VS-Lab at CSIR-Insitute of Genomics an Integrative Biology as an effort to aid analysis and interpretation of the sequencing data of SARS-CoV-2 using easy-to-use open source utilities using both reference-guided and de novo based strategies.

This README document illustrates the prerequisites and installations steps required to run the pipeline for paired-end samples. For a detailed description of the steps involved in this protocol along with the commands used, please read the Analysis Steps document.

More information about the lab and our work on COVID-19 can be found at the lab website.

Quickstart

Installation

To use conda, download and install the latest version of Anaconda.

Create and activate the covid19-genepi conda environment:

conda env create -f covid19-environment.yml
conda activate covid19-genepi

Update Krona Taxonomy

Krona taxonomy databases will have to be manually updated before Krona can generate taxonomic report. The following code assumes Anaconda is installed in the home directory. The path can be updated according to your installation.

bash ~/anaconda3/envs/covid19-genepi/opt/krona/updateTaxonomy.sh 
bash ~/anaconda3/envs/covid19-genepi/opt/krona/updateAccessions.sh

Set up Minikraken database

The Minikraken database having complete bacterial, archaeal, and viral genomes in RefSeq is available for download at the Kraken website.

wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken_8GB_202003.tgz
tar -xvf minikraken_8GB_202003.tgz
export KRAKEN2_DB_PATH="<path/to/folder/containing/minikraken/database>"

Install MEGAX

Install MEGAX Command-Line Interface for analyze molecular evolution and generate phylogenetic trees:

wget https://www.megasoftware.net/do_force_download/megacc_10.1.1_amd64_beta.tar.gz
tar -zxvf megacc_10.1.1_amd64_beta.tar.gz

Download Reference Genomes

The latest version of the human genome can be downloaded from GENCODE SARS-CoV2 genome can be downloaded from NCBI accession number NC_045512.2

Download the test dataset

The RNA sequencing data of 14 patients infected with SARS-CoV-2 sequenced by University of Washington can be download from SRA repository.

Download Reference Dataset for Phylogenetic Analysis

The Global Initiative on Sharing All Influenza Data (GISAID) gives public access to the most complete repository of sequencing data for SARS-CoV2. The sequences for phylogenetic analysis can be downloaded from the EpiCoV portal after creating an account and signing in.

About

Computational Protocol for Assembly and Analysis of SARS-nCoV-2 Genomes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published