Skip to content

Using an alignment file, get conserved region , the statistics of your mutations , and plot them nicely.

Notifications You must be signed in to change notification settings

AhmedElsherbini/Mutations_stats-and-plot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mutations_stats-and-plot

Kindly if you find this repo useful for your work, cite & star this repo

This simple Python3 script aims to analyze your protein (or gene) alignment file, get the longest conserved region, and statistics of your mutations, and plot them nicely in a way similar to the Manhattan GWAS plot.

Usage

Do you have a reference protein (or gene) in your alignment file?

Well, just put your reference as the top sequence.

python mut_stats_plot.py -i test.afa -f fasta -r

No reference?

No problem

Then, we do not provide -r argument. And we will consider the most common (base/aa) as the reference.

python mut_stats_plot.py -i test.afa -f fasta 

Dependencies ?

You need to have Biopython, adjusText , pandas, numpy and argparse (get them via pip3 or conda)

What do you get ?

1-FASTA file with the longest conserved region in your alignment which can be important for (eg: domain or conserved pocket analysis of your favorite protein) or to design a PCR for your gene-of-interest)

2-CSV file with the frequency (%) of the mutations in your alignment file.

3-CSV file with the frequency (count) of mutation combination pattern to answer a question like Which mutation comes with which mutation ?

4-CSV file with the mutation per each sequence.

5-a graph in PDF format which is similar in concept to the Manhattan GWAS plot (the dashed line is 10 %). PS: Y axis is % NOT -log(p)

alt text

Contributing

Everything is CRYSTAL clear. But anyhow, contact us here or directly via email: [email protected]