VanishingGlacierMAGs generation and analysis

This pipeline is used to generate MAGs from different individual assemblies and their respective reads for the VanishingGlaciers project. The pipeline is based on the Snakemake workflow management system and is designed to be run on a high-performance computing cluster.

Pipeline description

This pipeline starts with different individual assemblies (fasta files) and their respective reads (mg.r{1,2}.preprocessed.fq files).
To reduce computational time, the reads are subsampled to 10% reads per sample and the contigs less than 1.5 kbp are removed.
The subsampled reads are then mapped against the assemblies using BWA.
The mapped reads are then used to bin the contigs using MetaBAT2, CONCOCT and MetaBinner.
The bins are then optimized using DAS_Tool.
CheckM2 is used to estimate the quality of the bins and only the ones that are 50% complete are kept.
MDMCleaner reduces contamination from those bins.
Next, bins are dereplicated with dRep to form MAGs and only bins with >70% completeness and < 10% contamination are kept.
Read mapping against all the MAGs is done using BWA.
And GtdbTk is used for the taxonomy.
MGThermometer is used to measure the optimal growth rate based on the relative abundance of FIVYWREL aminoacids
- Optimal growth rate is measured as follows,

$$OGT = 937 * F_{IVYWREL} − 335$$

Setup

Conda

Conda user guide

# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions

Getting the repository including sub-modules

git clone --recurse-submodules [email protected]:michoug/SnakemakeBinning.git
git checkout busi

Create the main snakemake environment

# create venv
conda env create -f requirements.yaml -n "snakemake"

Run Setup

Place your preprocessed/trim reads (e.g. sample_r1.fastq.gz and sample_r2.fastq.gz files) in a reads folder
Place the individual assemblies (e.g. sample.fa) into an assembly folder
Modify the config/config.yaml file to change the different paths and eventually the different options
Modify the config/all_samples.txt file to include your samples

Without Slurm

snakemake -s workflow/Snakefile --configfile config/config.yaml --cores 28 --use-conda -rp

With Slurm

This part was mainly taken from @susheelbhanu nomis_pipeline

Modify the slurm.yaml file by checking partition, qos and account that heavily depends on your system
Modify the sbatch.sh file by checking #SBATCH -p, #SBATCH --qos= and #SBATCH -A options that heavily depends on your system

sbatch config/sbatch.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
envs		envs
scripts		scripts
workflow		workflow
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.yaml		requirements.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VanishingGlacierMAGs generation and analysis

Pipeline description

Setup

Conda

Run Setup

Without Slurm

With Slurm

About

Releases 2

Packages

Languages

License

michoug/VanishingGlacierMAGs

Folders and files

Latest commit

History

Repository files navigation

VanishingGlacierMAGs generation and analysis

Pipeline description

Setup

Conda

Run Setup

Without Slurm

With Slurm

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages