Home

Welcome to the RNA_Editing_Detection_Pipeline wiki!

Usage:

Download Reference Data

Create a tab-delimited file containing the urls to all required reference data keeping the first column identical to the example.

Example reference_data.txt:

genome  ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz
genome_annotation       ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/gencode.v30lift37.annotation.gtf.gz
strand_detection        https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/hg19_RefSeq.bed.gz
rmsk    http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz
dbSNP   http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp151.txt.gz
rediportal_db   http://srv00.recas.ba.infn.it/webshare/rediportalDownload/table1_full.txt.gz

Run get_ref_data_annotation.py to download all required data into specified directory and generate annotation files.

Parameters:

-i or --input: path to tab-delimited file containing data urls
-o or --output: path to output directory

Example:

nohup python3 get_ref_data_annotation.py -i reference_data.txt -o output_path &

Some reference data may need to be reformatted. This can be done following the instructions in box 7 of Lo Guidice et al. This will only need to be done once per genome release. Formatted reference data is provided in the test dataset.

Index Genome for STAR

Run index_genome_STAR.py to index the genome for STAR.

Parameters:

-f or --fasta: path to genome fasta file
-a or --gtf_annotation: path to genome gtf annotation
-o or --output: path to output directory

Example:

nohup python3 index_genome_STAR.py -f genome.fa -a annotation.gtf -o index_output/ &

Retrieve Fastq Files from SRA

Create a txt file containing a list of SRA accession numbers.
Run get_SRA_data.py to download data

Parameters:

-a or --acc_list: path to file containing list of SRA accession numbers
-o or --output: path to output directory

Example:

nohup python3 get_SRA_data.py -a acc.txt -o output_path &

Seq Reads Quality Check

Run fastqc.py to quality check the sequencing reads

Parameters:

-se or --single_end: include at beginning of parameters if data is single end
-f or --fastq_dir: path to fastq directory
-o or --output: path to output directory

Example:

PE data

nohup python3 fastqc.py -f fastq_dir -o output_dir &

SE data

nohup python3 fastp.py -se -f fastq_dir -o output_dir &

Trim RNAseq Reads

Run fastp.py to trim RNAseq Reads

Parameters:

-se or --single_end: include at beginning of parameters if data is single end
-f or --fastq_dir: path to fastq directory
-o or --output: path to output directory

Example:

PE data

nohup python3 fastp.py -f fastq_dir -o output_dir &

SE data

nohup python3 fastp.py -se -f fastq_dir -o output_dir &

Align RNAseq Reads

Make sure genome has been indexed for STAR
Run align_STAR.py to align paired-end data to the genome

Parameters:

-f or --fastq_dir: path to directory containing fastq files
-g or --genome_idx: path to STAR genome index
-o or --output: path to output directory

Example:

nohup python align_STAR.py -f fastq_dir -g genome_index -o output_dir &

Detection of the strand orientation of RNAseq reads

Run infer_strand_direction.py

Parameters:

-d or --bam_dir: path to directory containing bams
-r or --ref_seq_bed: path to refseq bed file

Example:

nohup python3 infer_strand_direction.py -d bam_dir -r ref_seq_bed &

Download Fastq Files of WGS from SRA

Create a text file containing a list of ERR accession numbers.
Run get_WGS_data.py to download data

Parameters:

-a or --acc_list: path to file containing a list of ERR accession numbers
-o or --output: path to output directory

Example:

nohup python3 get_WGS_data.py -a acc.txt -o output_path &

Index Genome for BWA

Run index_genome_bwa.py to index the genome for BWA.

Parameters:

-f or --fasta_dir: path to genome fasta file

Example:

nohup python3 index_genome_bwa.py -f fasta_dir &

Align DNAseq Reads

Run align_bwa.py to align paired-end data to the genome

Parameters:

-fq or --fastq_dir: path to directory containing fastq files
-fa or --fasta_dir: path to directory containing genome fasta file

Example:

nohup python3 align_bwa.py -fq fastq_dir -fa fasta_dir &

Select and map reads to a chromosome

Run select_map_chr.py to select and map reads to a specific chromosome

Parameters:

-g or --genome_dir: path to directory containing the genome .fai file
-f or --fastq_dir: path to directory containing the WGS fastq file and also sam file
-o or --output_dir: path to directory store the output files
-chr or --chrNum: select the chromosome number as 'chr[Int]' (e.g. -chr chr21)

Example:

nohup python3 select_map_chr.py -g genome_dir -f fastq_dir -o output_dir -chr chrNum &

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Usage:

Download Reference Data

Index Genome for STAR

Retrieve Fastq Files from SRA

Seq Reads Quality Check

Trim RNAseq Reads

Align RNAseq Reads

Detection of the strand orientation of RNAseq reads

Download Fastq Files of WGS from SRA

Index Genome for BWA

Align DNAseq Reads

Select and map reads to a chromosome

Clone this wiki locally