Skip to content

Perl Module that uses consensus sequences to find Exon-Intron Boundaries

License

Notifications You must be signed in to change notification settings

elmoren/boundary-finder-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME

BoundaryFinder - Exon/Intron Boundary Finder Module

SYNOPSIS

use BoundaryFinder;

$seq = "aaactgcaaggt";
my $hits = BoundaryFinder::score_seq(-seq => $seq, -cons_seq => $cons_seq);

DESCRIPTION

The BoundaryFinder uses consensus sequences to find and score boundaries in a sequence. This was developed for finding exon boundaries, but it can be used with any consensus sequence.

The boundary finder works best when you finding boundaries in small sequences. This is because the number of matches is often large, especially if there are no "100%" positions. (e.g. 100% of nucleotides have to be an "A" at position x, or it does not match)

OVERVIEW

The general procedure of using the library to find exon boundaries is as follows:

1. Use the BlastManager to blast your query sequence agaist the subject database.

This will give a BlastResults oject, or a set of blast hits.

2. Determine what BlastResults or hits you want use.

Sometimes blast returns hits that are not relevant to your query. These could be weak matches that blast returned that do not make sense with your query. You can trim these out by changing the blast parameters (such as a minimum "e" value) or iterating through the BlastResults and removing the unwanted hits.

3. Use BoundaryFinder to find the boundaries of the blast hits

Each query is a pair of sequences representing the 5` and 3` ends of a blast hit.

This can be automatically done by passing a BoundaryResults object, or done manually. It will create a query with a 50 extra nucleotides from each end of the edges of the blast hit. You can easily specify a buffer size other than 50.

***************************************************************
E.g. 

This hastily done ascii example shows how the BoundaryFinder will
create queries automatically from a hit. Given the blast result below,
a sequence is created by using 50 nucleotides upstream (5') to the end
of the blast hit (3' end). This is also done for the 3' end, with 
          50 NT after the end of the hit.

Query   ATGATGATGATGATG
        | |||||||||||    ... and so on
Sub     ACGATGATGATGACC
       
|< -50  ^  ------------- To the end of the blast hit --------------->|

^ = Points to the 5' edge of the blast result.

***************************************************************

This can be done manually too, and it takes only a little more work. You will just have to figure out the buffers for each hit, and add them with the BoundaryFinder::addQuery method.

4. Run the BoundaryFinder to get a BoundaryResults object.

The BoundaryResults object contains all the non-zero scores, and their positions in the blast database for easy retrieval. It is simply a wrapper of two hashes that contain the position as keys, and score as the value.

USAGE

# First run blasts
my $query = "protein.faa";
my $db = "blast_databases/dmel"; # Name of the database to query

# Init the BlastManager
my $bm = BlastManager->new(-blast_db => $db);
# set the expect value (optional
$bm->evalue(-e => 1e-1);       

# Run the tblastn 
my $br = $bm->tblastn(-query => $query);

# Load the 5' and 3' consensus sequences for Drosophila
$seqIO = new ConsensusSeqIO(-path => "t/drosophila.cons");
my $cons_seq_5;
my $cons_seq_3;

$cons_seq_5 = $seqIO->next_seq;
$cons_seq_3 = $seqIO->next_seq;


# Run the blast
#
# The 5' and 3' are switched because that is the way the consensus sequences
# are from the literature
#
my $bf_results = BoundaryFinder::score_hit(-hit => $hit, 
                                    -db => $db,
                                    -cons_seq_5 => $cons_seq_3,
                                    -cons_seq_3 => $cons_seq_5  );

AUTHOR

Nathan Elmore
[email protected]
http://www.elmoren.com

SEE ALSO

Blast+ 2.2.28

About

Perl Module that uses consensus sequences to find Exon-Intron Boundaries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages