Scripts

This is a place where all my scripts are saved.

R scripts:
- merge_results_fin.R
Bash scripts:

R Scripts

Merge_results_fin.R

Merge_results_fin.R is a script that starts with a list of samples and a raw table with gene homologues.

The script does various things:

Edits the raw gene_table to delete redundant information and present a clean table to work with.
Runs ballgown on all the samples given, creating one new table for each of those results my merging them with the gene_table.
Takes the recently made tables and merges all of them into one final_table.
Adds valuable information to the final_table and order to ease its comprehension.

For this task it uses for loops and if conditionals to iterate over the data, generating the results automatically.

Warning: The script doesn't include previous RNAseq analysis which are necessary to generate the .ctab files that the script needs to work. These are:

Quality analysis: Using FastQC.
Trimming: Using trimmomatic.
Alignment: Using HISAT2 or STAR.
Post Processing:
- Marking and removing duplicates: Using Preseq and dupRadar.
- Generating FPKM tables: Using featureCounts and Stringtie (which includes ballgown).

Later analysis could be: differential expression analysis (Deseq2), plotting (EdgeR or ggplots) and MultiQC for quality analysis.

Required libraries: tidyverse, ballgown, reshape, stringr

Bash scripts

pipeline.sh

A pipeline to run trimmomatic and trinity over all the paired-end samples (2 .fastq files each) found in a given directory or subdirectories. Single-end and stranded are not supported, you might need to change the trinity parameters if you wish to use those kind of samples. Example:

bash ./pipeline.sh -p home/Analysis/RNAseq_samples/ -t home/share/trimmomatic/trimmomatic.jar -a trimmomatic_adapters.file -y home/share/trinity/Trinity

Warning: Trimmomatic and Trinity parameters are predefined to work for a wide variety of species from different reigns. If you wish you can change them directly from the script. Also keep in mind that Trinity's De novo assembly is a high consuming process: it might take 1 hour for every million reads

lengther.sh

A script that selects -n number of genes from a gene_name table and finds their sequence searching through fasta files in a given directory, printing its length and the sum of total aminoacids of the selected gene-sequences. Its useful to know if your request to other software like secretomeP will surpass the permitted limit or not.

bash ./lengther.sh -f path/to/gene_table.file -p path/to/fastas_directory/ -n num -m path/to/pangenome_matrix.file

Warning: the pathing used in the script is not global, the script might not work if you run it through different locations.

geth_loop.sh

A script to run get_homologues on loop for multiple samples against a reference. It creates a whole folder enviroment in the given directory based on a reference folder containing all the fasta files that the user wants to use for get_homologues, then creates a copy of the refference fasta files in each folder and a copy of one of the samples in each directory according to its sample_name. After that, it runs get_homologues on each of those folders by changing working directory temporarily on background.

Advantages of this script: It can be executed from any place on the system. It will create everything in the directory of the script executable, no matter the current working directory

hlfinder.sh

A script to run once get_homologues is finished. It serves to delete homologues between the reference species and other nematodes (or similar species) that could lead to a false positive in the serological test of trichinella. For that purpose, it takes a reference table given which has the reference homologues + the sample's FPKM values for each gene. Then, it finds genes from a reference_ID.txt file given and creates a new table without the matching genes found in all the .faa files generated by get_homologues. The advantage of this script is that it doesn't need the compare_clusters.pl from the get_homologues package to be run, because it automatically finds all the results folders for each algorythm used in the get_homologues run. Update: Now it also adds a row with the average FPKM value for each column at the bottom of the table.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
DESeq2_analysis.R		DESeq2_analysis.R
README.md		README.md
get_h.sh		get_h.sh
geth_loop.sh		geth_loop.sh
hlfinder.sh		hlfinder.sh
json_modifier.ipynb		json_modifier.ipynb
json_reader.py		json_reader.py
lengther.sh		lengther.sh
merge_results_fin.R		merge_results_fin.R
pipeline.sh		pipeline.sh
rename.sh		rename.sh
sample_cleaner.py		sample_cleaner.py
seqstater.sh		seqstater.sh
services.json		services.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scripts

R Scripts

Merge_results_fin.R

Bash scripts

pipeline.sh

lengther.sh

geth_loop.sh

hlfinder.sh

About

Releases

Packages

Languages

Shettland/Scripts

Folders and files

Latest commit

History

Repository files navigation

Scripts

R Scripts

Merge_results_fin.R

Bash scripts

pipeline.sh

lengther.sh

geth_loop.sh

hlfinder.sh

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages