Overview
Sprout is an approach to identifying pairs of spatially proximal binding events for a protein from ChIA-PET data.
How To Run
Applying Sprout to ChIA-PET data requires several steps. Sprout is implemented to run on a cluster of machines through the Sun Grid Engine (SGE) queuing system. It is assumed that ChIA-PET sequence data have been appropriately processed to remove chimeric read pairs and that the linker sequences have been removed from the remaining sequence data. The remaining non-chimeric genomic sequence read pairs should be aligned to the appropriate reference genome. Reads from each pair should be aligned independently because no assumptions should be made about the locations of the reads in a pair relative to each other. It is assumed that input files reflect all read pairs such that both reads in each pair align to a unique location in the reference genome. Input files are expected to be tab-delimited with a pair of genomic locations on each line corresponding to the aligned locations of a read pair. For example:
11:22793448:+ 13:56522051:- 2:75705251:+ 5:53998331:- 11:106929428:- 11:106929538:+ 15:99392393:- 15:99393022:+ 12:104434000:- 3:96247693:-
Running all of the following commands with sproutseed.jar on the classpath should include all necessary dependencies. The first stage of Sprout identifies the locations of binding events. A set of initial binding event locations is generated by running MuTauFileGenerator.java:
edu.mit.csail.cgs.reeder.sproutseed.MuTauFileGenerator --species "Mus musculus;mm9" --spacing 500 --buffer 2000 --readfile --outfile
BreakUpMutauFile.java breaks up the file containing initial binding event locations into a number of smaller files to make event location detection more efficient:
edu.mit.csail.cgs.reeder.sproutseed.BreakUpMuTauFile --species "Mus musculus;mm9" --buffer 4000 --numregions 100 --mutaufile --outbase
SubmitSeedMuFile.java generates sets of commands that each submit a job to SGE. The following is an example of a set of arguments for SubmitSeedMuFile.java. The effects of the parameter settings are described in the Sprout manuscript.
edu.mit.csail.cgs.reeder.sproutseed.SubmitSeedMuFile --species "Mus musculus;mm9" --genome "mm9_1.txt" --rho 0.7 --alpha 5 --beta 1 --a 1 --b 1 --readfile --dumpfile --outfile --directory --stage 1 --maxiters 2000 --mutaubase --mutaunum --wd --submitfile
The basic Sprout workflow skips stage 2 and continues with what is called stage 3 in the code. First, the results from stage 1 must be consolidated by chromosome in order to be able to identify interactions between regions that were broken up to make binding event identification more efficient:
edu.mit.csail.cgs.reeder.sproutseed.ConsolidateMuFileStage1Results --species "Mus musculus;mm9" --filebase <prefix for the files that contain results from stage 1> --outbase --numfiles <number of files containing results from stage 1> --readfile --numreads
SubmitSeedMeFile3.java generates another set of commands that submit jobs to SGE:
edu.mit.csail.cgs.reeder.sproutseed.SubmitSeedMuFile3 --species "Mus musculus;mm9" --genome "mm9_1.txt" --rho 0.7 --alpha 5 --beta 1 --a 1 --b 1 --readfile --dumpfile --outfile --directory --stage 3 --maxiters 1000 --stage2file <prefix for the files containing the results from the previous stage, in this case stage 1> --eventout --interactionout --wd --submitfile
Contact reeder.c at gmail