Skip to content
/ GAA Public

GAA is designed to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.

Notifications You must be signed in to change notification settings

ghyao/GAA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

NAME
====

GAA - Graph Accordance of Assemly

VERSION
=======

Version 1.0

LICENCE
=======

Copyright 2011 - Guohui Yao - Washington University in St. Louis.
	
gyao at wustl dot edu

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
    
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
        
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
MA 02110-1301, USA.


INTRODUCTION
============

No individual assembly algorithm addresses all the known limitations 
of assembling short length sequences. Overall reduced sequence contig 
length is the major problem that challenges the usage of these assemblies. 
GAA is designed to take advantages of different assembly algorithms or 
sequencing platforms to improve the quality of next-generation sequence 
(NGS) assemblies.


PREREQUISITES
=============

Perl => 5.8.9
Sequence aligners: BLAT GSMapper


INSTALLATION
====

The program is tested under LINUX systems. Below command could install 
the GAA package.
export PERL5LIB=$PERL5LIB:path_to_the_gaa_package


COMMAND LINE
============
	
Run gaa.pl without any parameter, it will print below usage on the screen.
	
-.pl	--target|-t 		<target.fa>		 : target consensus file, the contig names should be in the format of >Contig1.1
    	--query|-q 		<query.fa> 		 : query  consensus file, the contig names should be in the format of >Contig1.1
    	
The following options are optional.
    	--target-qual|-T 	<target.qual>	 	 : target quality file
    	--query-qual|-Q 	<query.qual> 	 	 : query quality file
	--target_gap|-g         <target gap file>	 : e.g. gap.txt for Newbler
	--query_gap|-G          <query gap file>	 : e.g. gap.txt for Newbler
    	--match|-m 		<alignment.psl>          : blat target.fa query.fa -fastMap query_vs_target.psl
    	
	--pair_status_target|r	<454PairStatus.txt>	 : generated from GSMapper
	--pair_status_query|R	<454PairStatus.txt>	 : generated from GSMapper
    	--ce                    <ce.psl>          	 : ce status psl file

    	--output-dir|-o 	<output_dir=pwd> 	 : default is merged_assembly under current directory
    	--tip-size|-p 		<tip_size=90>		 : tip tolerance 
    	--score-cutoff-lower|-l	<score_cutoff_lower=30>	 : confident unique contig, if score < 30
    	--score-cutoff-upper|-u	<score-cutoff_upper=100> : confident match,         if score > 100
    	--contig-size-cutoff|-s	<contig_size_cutoff=100> : keep unique contigs of length > 100
	Using -m option would shortcut the alignment step if alignment.psl alread exists.


EXAMPLES
========

$ ./gaa.pl -t newbler.fa -q cabog.fa
Merge newbler.fa and cabog.fa, and output newbler.fa_n_cabog.fa as the merged assembly to:

$ ./gaa.pl -t newbler.fa -q cabog.fa -T newbler.qual -Q cabog.qual
Also generate a quality file (newbler.qual_n_cabog.qual) for the merged assembly.

$ ./gaa.pl -t newbler.fa -q cabog.fa -t newbler.gap
Generate the gap file (g.gap) for merged assembly.

$ head target.fa
>Contig0.1
AAGgAAGCAAATtAtAAAaCtAaaGCATTAAAGATATACACATAAAAATGAAGTACTAAA
TTTATCAAGACAAAAaGCCCTGTCCAAATTGTACTATTGTCCAACTTGTACCACTTTACC
>Contig0.2
CCCTAAATTGAGAAAAaTAACTTTAGGGAATATATCAATGTACAGTGCCCGCTTCGTTAT
TCGGGAGAGAAATTCGAAAAAGTTAAAATAACATTCCGGAAATGTTAATTTTACCCTGCA
>Contig0.3
GTATTGATCCAAAATCGGTGTAAATATTACCCTTTTTAGGTGTATTAGGTGTTAAATGTA
TCCTTTTTCATGTTAATTTTACCCTTAATAAGGTGTAAAATTAACATTAAAAAATGTTGA
TATATTTTTACACCTAAAAAGTGTTAAATTTCTGAGGAAAAATGTTAATCGCACCCTCTT

$ head target.qual
>Contig0.1
64 64 64 26 64 64 64 64 64 64 50 64 26 64 16 64 64 64 22 64 39 64 
39 39 64 64 55 64 64 64 64 64 64 64 64 64 64 64 64 48 64 64 64 64 
64 64 64 49 45 64 64 64 64 64 64 64 64 64 64 32 64 64 64 64 64 64 

$ head gap.txt
Contig0.1 20
Contig0.2 811
Contig0.3 932
Contig0.4 876
Contig0.5 1927
Contig0.6 1730
Contig0.7 20
Contig1.1 171
Contig1.2 402
Contig1.3 1382

$ head 454PairStatus.txt
Template	Status	Distance	Left Accno	Left Pos	Left Dir	Right Accno	Right Pos	Right Dir	Left Distance	Right Distance
FC3FKPP01BUVA8	TruePair	3967	Contig18.13	24188	-	Contig18.13	20221	+		
FC3FKPP01BTJNG	TruePair	3820	Contig23.11	44530	+	Contig23.11	48350	-		
FC3FKPP01AFDGW	TruePair	2498	Contig41.19	5103	-	Contig41.19	2605	+		
FC3FKPP01CEJ64	TruePair	2498	Contig41.19	5103	-	Contig41.19	2605	+		
FC3FKPP01E3K7I	TruePair	3735	Contig117.3	98721	-	Contig117.3	94986	+		
FC3FKPP01A7AS8	FalsePair	-	Contig129.5	2438	+	Contig129.6	1986	-	276	1986
FC3FKPP01BJX6Q	FalsePair	-	Contig133.3	28972	+	Contig140.4	1428	-	5803	1428
FC3FKPP01AWH9A	TruePair	2636	Contig40.10	122575	+	Contig40.10	125211	-		
FC3FKPP01DCED4	MultiplyMapped	-	Contig18.9	1244	-	Repeat		


CONTACT
=======

Guohui Yao <gyao at wustl dot edu>

About

GAA is designed to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published