A collection of scripts to get started with running the REPET pipeline on a cluster with the SLURM resource manager and a module system installed.
- FASTA Format
- Header
- Recommended format: ">XX_i" (XX = letters, i = numbers)
- avoid spaces and symbols like "=;:|"
- 60 bps (or less) per line for sequences
- Header
- Host genome (FASTA format)
- REPET-specific Pfam HMM File
- rDNA (FASTA format) of host genome
- RepBase Amino Acid Database
- RepBase Nucleotide Database
- cDNA of host genome (FASTA format)
A RepeatScout bank can also be provided but there are additional pre-processing steps before it can be used in the pipeline. See the TEdenovo tuto webpage or text file included with REPET. These scripts currently do NOT perform this pre-processing steps.
- Host genome (FASTA format)
- TE library (FASTA format)
- from TEdenovo or another source
- RepBase Amino Acid Database
- RepBase Nucleotide Database
- Clone the repository and copy the default configuration.
$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEdenovo
$ cp /path/to/REPET/config/TEdenovo.cfg .
- Change the settings in
TEdenovo.cfg
andTEdenovo_AllSteps.sh
to match your environment/project. - Copy/link the prerequisite files into the TEdenovo folder.
sh TEdenovo_AllSteps.sh
orsbatch TEdenovo_AllSteps.sh
.
If you already ran TEdenovo, then skip step 1.
- Clone the repository and copy the default configuration.
$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEannot
$ cp /path/to/REPET/config/TEannot.cfg .
- Change the settings in
TEannot.cfg
andTEannot_AllSteps.sh
to match your environment/project. - Copy/link the prerequisite files into the TEannot folder.
- TE library has a required naming format:
<project_name>_refTEs.fa
- TE library has a required naming format:
sh TEannot_AllSteps.sh
orsbatch TEannot_AllSteps.sh
.