batch MLST of species

Purpose: to get allele (nucleotide) sequences according to multi-locus sequence types (STs)

Input

a tab-delimited file (despite its extension enigmatically being TXT) from PubMLST, labeled as a 'profile' for a particular species database
all multi-FastA formatted files of alleles for the specified profile in the cwd; default file extension is TFA as PubMLST uses, however any extension can be specified with the --ext opt

Output

multi-FastA format file of each allele for a ST (ST-##.fas)
a FastA file of the concatenated alleles for a ST (concat_ST-##.fas)

Usage

python get_allele_seqs_from_PubMLST_db.py -p profile.txt

Example Install

cd $HOME
pip install biopython numpy
git clone https://github.com/chrisgulvik/batch_MLST_of_species.git
echo 'export PATH="$PATH:$HOME/batch_MLST_of_species"' >> $HOME/.bash_profile
chmod u+x ~/batch_MLST_of_species/get_allele_seqs_from_PubMLST_db.py

Example Usage

Step 1 - Download profile and corresponding alleles:

mkdir ~/Cdiff_MLST && cd ~/Cdiff_MLST
for f in {profiles/cdifficile.txt,alleles/cdifficile/adk.tfa,alleles/cdifficile/atpA.tfa,alleles/cdifficile/dxr.tfa,alleles/cdifficile/glyA.tfa,alleles/cdifficile/recA.tfa,alleles/cdifficile/sodA.tfa,alleles/cdifficile/tpi.tfa};
    do wget http://pubmlst.org/data/"$f";
done

Step 2 - Get all allele sequences for each ST

get_allele_seqs_from_PubMLST_db.py -p cdifficile.txt
ls concat_ST-*.fas | xargs cat > 351_Cdiff_concat_MLSTs.fa
rm *.fas

Step 3 - Align and infer phylogeny

muscle -in 351_Cdiff_concat_MLSTs.fa -out 351_Cdiff_concat_MLSTs.fa.aln
python ~/scripts/fasta2phylip.py 351_Cdiff_concat_MLSTs.fa.aln 351_Cdiff_concat_MLSTs.aln.phy
# collapse invariant sites
raxmlHPC-PTHREADS -T 30 -f a -s 351_Cdiff_concat_MLSTs.aln.phy -n Cdiff_concat_MLSTs -p 654321 -x 54321 -N 100 -m GTRGAMMA

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
get_allele_seqs_from_PubMLST_db.py		get_allele_seqs_from_PubMLST_db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

batch MLST of species

Input

Output

Usage

Example Install

Example Usage

About

Releases

Packages

Languages

chrisgulvik/batch_MLST_of_species

Folders and files

Latest commit

History

Repository files navigation

batch MLST of species

Input

Output

Usage

Example Install

Example Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages