Skip to content
Jessica Mattick edited this page Apr 30, 2020 · 26 revisions

Welcome to the RNA_Editing_Detection_Pipeline wiki!

Usage:

Download Reference Data

  1. Create a tab-delimited file containing the urls to all required reference data keeping the first column identical to the example.

Example reference_data.txt:

genome  ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz
genome_annotation       ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/gencode.v30lift37.annotation.gtf.gz
strand_detection        https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/hg19_RefSeq.bed.gz
rmsk    http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz
dbSNP   http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp151.txt.gz
rediportal_db   http://srv00.recas.ba.infn.it/webshare/rediportalDownload/table1_full.txt.gz
  1. Run get_ref_data_annotation.py to download all required data into specified directory and generate annotation files.

Parameters:

  • -i or --input: path to tab-delimited file containing data urls
  • -o or --output: path to output directory

Example:

nohup python3 get_ref_data_annotation.py -i reference_data.txt -o output_path &

Some reference data may need to be reformatted. This can be done following the instructions in box 7 of Lo Guidice et al. This will only need to be done once per genome release. Formatted reference data is provided in the test dataset.

Index Genome for STAR

  1. Run index_genome_STAR.py to index the genome for STAR.

Parameters:

  • -f or --fasta: path to genome fasta file
  • -a or --gtf_annotation: path to genome gtf annotation
  • -o or --output: path to output directory

Example:

nohup python3 index_genome_STAR.py -f genome.fa -a annotation.gtf -o index_output/ &

Retrieve Fastq Files from SRA

  1. Create a txt file containing a list of SRA accession numbers.
  2. Run get_SRA_data.py to download data

Parameters:

  • -a or --acc_list: path to file containing list of SRA accession numbers
  • -o or --output: path to output directory

Example:

nohup python3 get_SRA_data.py -a acc.txt -o output_path &

Seq Reads Quality Check

  1. Run fastqc.py to quality check the sequencing reads

Parameters:

  • -se or --single_end: include at beginning of parameters if data is single end
  • -f or --fastq_dir: path to fastq directory
  • -o or --output: path to output directory

Example:

PE data

nohup python3 fastqc.py -f fastq_dir -o output_dir &

SE data

nohup python3 fastp.py -se -f fastq_dir -o output_dir &

Trim RNAseq Reads

  1. Run fastp.py to trim RNAseq Reads

Parameters:

  • -se or --single_end: include at beginning of parameters if data is single end
  • -f or --fastq_dir: path to fastq directory
  • -o or --output: path to output directory

Example:

PE data

nohup python3 fastp.py -f fastq_dir -o output_dir &

SE data

nohup python3 fastp.py -se -f fastq_dir -o output_dir &

Align RNAseq Reads

  1. Make sure genome has been indexed for STAR
  2. Run align_STAR.py to align paired-end data to the genome

Parameters:

  • -f or --fastq_dir: path to directory containing fastq files
  • -g or --genome_idx: path to STAR genome index
  • -o or --output: path to output directory

Example:

nohup python align_STAR.py -f fastq_dir -g genome_index -o output_dir &

Detection of the strand orientation of RNAseq reads

  1. Run infer_strand_direction.py

Parameters:

  • -d or --bam_dir: path to directory containing bams
  • -r or --ref_seq_bed: path to refseq bed file

Example:

nohup python3 infer_strand_direction.py -d bam_dir -r ref_seq_bed &

Download Fastq Files of WGS from SRA

  1. Create a text file containing a list of ERR accession numbers.
  2. Run get_WGS_data.py to download data

Parameters:

  • -a or --acc_list: path to file containing a list of ERR accession numbers
  • -o or --output: path to output directory

Example:

nohup python3 get_WGS_data.py -a acc.txt -o output_path &

Index Genome for BWA

  1. Run index_genome_bwa.py to index the genome for BWA.

Parameters:

  • -f or --fasta_dir: path to genome fasta file

Example:

nohup python3 index_genome_bwa.py -f fasta_dir &

Align DNAseq Reads

  • Run align_bwa.py to align paired-end data to the genome

Parameters:

  • -fq or --fastq_dir: path to directory containing fastq files
  • -fa or --fasta_dir: path to directory containing genome fasta file

Example:

nohup python3 align_bwa.py -fq fastq_dir -fa fasta_dir &

Select and map reads to a chromosome

  • Run select_map_chr.py to select and map reads to a specific chromosome

Parameters:

  • -g or --genome_dir: path to directory containing the genome .fai file
  • -f or --fastq_dir: path to directory containing the WGS fastq file and also sam file
  • -o or --output_dir: path to directory store the output files
  • -chr or --chrNum: select the chromosome number as 'chr[Int]' (e.g. -chr chr21)

Example:

nohup python3 select_map_chr.py -g genome_dir -f fastq_dir -o output_dir -chr chrNum &