Prot-Search

Computational Genomics Final Project - Shreyas Aiyar, Debanik Purkayastha, Ishita Tripathi

Directory Structure

.
├── api
    ├── matching
        ├── index_assisted_dp.py                        # Index Assisted DP algorithm implementation
        ├── neighbourhood.py                            # Neighbourhood search algorithm implementation
        ├── pigeonhole_principle_approximate_match.py   # L-mer filtration algorithm implementation
        ├── six_frame_translation.py                    # Six frame translation implementation
        ├── blast.py                                    # Trial implementation of blast.py
    ├── api.py
├── client                                              # Web app files
├── tests
    ├── data_generation.py                              # Script to generate test reads
    ├── kmer_dict_tests.py                              # Test script to compare kmer dictionary sizes and creation runtime for different k's and number of proteins. 
    ├── test_suite.py                                   # Test script to run and benchmark algorithms
├── scripts
    ├── kmers.py                                        # Script to generate L-mer dictionary file
    ├── proteins.py                                     # Script to generate protein id dictionary file
├── data                                                # Stores supporting dictionary files in .pickle format
├── contributions.txt
├── recording                                           # Screen recording of the web app
└── README.md

Requirements

To run the web app specifically, make sure your machine has Node.js installed. After installing Node.js, make sure that yarn is accessible as a package manager.

Node.js - https://nodejs.org/en/download/
yarn - https://classic.yarnpkg.com/en/docs/install/#mac-stable

Also, to generate some supporting files, you would require the Uniprot FASTA file which can be downloaded from:
Uniprot FASTA File - ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

Place the Uniprot File in /data directory once downloaded

Web-App

To start server

From root directory

cd api && pip install -r requirements.txt && flask run

To start client

From root directory

cd client && yarn install && yarn start

Testing

Generating Synthetic Reads

cd tests
python3 data_generation.py --infile ../data/uniprot_sprot.fasta --seed 1 --num_errors 4 --num_reads 5 --read_length 60 --no-gaps
cd ..

Benchmarking/Profiling Algorithms

Note: Generate synthetic reads prior to this
Smaller values of L would take a while to run

cd tests
python3 test_suite.py --infile test_reads.txt --mm 4 --l 4
cd ..

Testing L-Mer Dictionary

cd tests
python3 kmer_dict_tests.py
cd ..

Misc

Generating L-mer Dictionary File

cd scripts
python3 kmers.py --infile ../data/protein_dict_num_prots_100.pickle -k 5 -n 100
cd ..

Generating Protein Dictionary File

cd scripts
python3 proteins.py --infile ../data/uniprot_sprot.fasta --num_proteins 100
cd ..

Neighborhood search for a sequence [PROT_SEQ]

cd api/matching
python3 neighborhood.py -i [PROT_SEQ] -m 1 -t 17
cd ../../

Note: The query files have sample usage given as comments at the top of the file. They can be run independently as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prot-Search

Table of Contents

Directory Structure

Requirements

Web-App

To start server

To start client

Testing

Generating Synthetic Reads

Benchmarking/Profiling Algorithms

Testing L-Mer Dictionary

Misc

Generating L-mer Dictionary File

Generating Protein Dictionary File

Neighborhood search for a sequence [PROT_SEQ]

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
api		api
client		client
data		data
recording		recording
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
contributions.txt		contributions.txt

debanik1997/Prot-Search

Folders and files

Latest commit

History

Repository files navigation

Prot-Search

Table of Contents

Directory Structure

Requirements

Web-App

To start server

To start client

Testing

Generating Synthetic Reads

Benchmarking/Profiling Algorithms

Testing L-Mer Dictionary

Misc

Generating L-mer Dictionary File

Generating Protein Dictionary File

Neighborhood search for a sequence [PROT_SEQ]

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages