Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better setup.py with installation of scripts #48

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 45 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ Read our [short SUPPA tutorial on an example dataset](https://github.com/comprna

* [Overview](#overview)
* [Installation](#installation)
* [From the tar.gz archive](#from-the-targz-archive)
* [Using pip](#using-pip)
* [Using bioconda](#using-bioconda)
* [Command and subcommand structure](#command-and-subcommand-structure)
* [Generation of transcript events and local alternative splicing events](#generation-of-transcript-events-and-local-alternative-splicing-events)
* [Input files](#input-files)
Expand Down Expand Up @@ -72,10 +75,34 @@ SUPPA has been developed in Python 3.4.

If necessary, to install python3 we recommend to download from the official site https://www.python.org/downloads/ the corresponding version for your OS.

## From the tar.gz archive

Uncompress the archive using:
```
tar xvzf SUPPA-2.3.tar.gz
cd SUPPA
```

As root/administrator, you can install with:
```
python3 setup.py install
```
or as a non-privilegied user for a user specific install:
```
python3 setup.py install --user
```
in this case, on a unix-like environment, the executable are installed by default in `~/.local/bin`. You can add this path to the shell search path with (Bash shell):

```
if ! grep -q 'PATH.*~/\.local/bin' ~/.bashrc ; then echo 'export PATH=$PATH:~/.local/bin' >>~/.bashrc; fi
```

## Using pip

A installation using pip is available using the next command:

```
pip install SUPPA==2.2.1
pip install SUPPA==2.3
```
By default SUPPA is installed into the Python package library directory. The following command can be executed to obtain the directory location:

Expand All @@ -85,6 +112,8 @@ pip show SUPPA

SUPPA is ready to use. Once downloaded, it can be used directly from the command line by specifying the absolute path to the SUPPA executable (suppa.py).

## Using bioconda

Another option is via bioconda (thanks to Devon Ryan)

```
Expand All @@ -98,7 +127,7 @@ conda install -c bioconda suppa
SUPPA works with a command/subcommand structure:

```
python3.4 suppa.py subcommand options
python3 suppa2.py subcommand options

```
where the subcommand can be one of these five:
Expand Down Expand Up @@ -165,7 +194,7 @@ The *generateEvents* operation uses the lines where the feature (column 3) is "e
To generate the events from the GTF file one has to run the following command:

```
python3.4 suppa.py generateEvents [options]
python3 suppa2.py generateEvents [options]
```
List of options available:

Expand Down Expand Up @@ -202,13 +231,13 @@ List of options available:
The command line to generate local AS events will be of the form:

```
python3.4 suppa.py generateEvents -i <input-file.gtf> -o <output-file> -f ioe -e <list-of-events>
python3 suppa2.py generateEvents -i <input-file.gtf> -o <output-file> -f ioe -e <list-of-events>
```

The command to generate the transcript "events" would be of the form:

```
python3.4 suppa.py generateEvents -i <input-file.gtf> -o <output-file> -f ioi
python3 suppa2.py generateEvents -i <input-file.gtf> -o <output-file> -f ioi
```

## Output files
Expand Down Expand Up @@ -377,7 +406,7 @@ transcript3 <expression> <expression> <expression> <expression>
At the moment the PSI per transcript isoform is calculated in the following way:

```
python3.4 suppa.py psiPerIsoform [options]
python3 suppa2.py psiPerIsoform [options]
```
List of options available:

Expand All @@ -394,15 +423,15 @@ List of options available:
An example of the usage of the program is:

```
python3.4 suppa.py psiPerIsoform -g <gtf-file> -e <expression-file> -o <output-file>
python3 suppa2.py psiPerIsoform -g <gtf-file> -e <expression-file> -o <output-file>
```

### **PSI per local event** ###

To calculate the PSI value for each event from the *ioe* and the *transcript expression file* one has to run the following command:

```
python3.4 suppa.py psiPerEvent [options]
python3 suppa2.py psiPerEvent [options]

```
List of options available:
Expand All @@ -422,7 +451,7 @@ List of options available:
An example of the usage of the program is:

```
python3.4 suppa.py psiPerEvent --ioe-file <ioe-file> --expression-file <expression-file> -o <output-file>
python3 suppa2.py psiPerEvent --ioe-file <ioe-file> --expression-file <expression-file> -o <output-file>
```

### **Output files** ###
Expand Down Expand Up @@ -456,7 +485,7 @@ ENSG00000000419.12;SE:chr20:50940933-50941105:50941209-50942031:- 0.023022
Transcript expression files used with SUPPA typically come from calculations with multiple samples. To facilitate the generation of a single file with all the transcript expression values for all samples, SUPPA distribution includes a program to combine multiple simple transcript expression files into one single file:

```
python3.4 suppa.py joinFiles [options]
python3 suppa2.py joinFiles [options]
```


Expand All @@ -474,7 +503,7 @@ where the options are:
We show below an example of the usage of the program for reading multiple output files from Sailfish to join together the 3rd column, given that all files have in the first column the transcript ids (which are kept for the output):

```
python3.4 suppa.py joinFiles -f tpm -i sample1.tpm sample2.tpm sample3.tpm -o all_samples_tpms
python3 suppa2.py joinFiles -f tpm -i sample1.tpm sample2.tpm sample3.tpm -o all_samples_tpms
```

The output will look like an expression file with multiple files as described above.
Expand Down Expand Up @@ -521,7 +550,7 @@ where the expression values are given in TPM units.
### **Command and options** ###
To calculate the dpsi from the *ioe*, *psi* and the *expression file* one has to run the following command:
```
python3.4 suppa.py diffSplice [options]
python3 suppa2.py diffSplice [options]
```

List of options available:
Expand Down Expand Up @@ -564,15 +593,15 @@ List of options available:
An example of the usage of the program with transcripts is, indicating that replicates are paired (-pa), to apply a multple testing correction (-gc) and perform pairwise comparison between all conditions (-c):

```
python3.4 suppa.py diffSplice --method <empirical> --input <ioi-file> --psi <Cond1.psi> <Cond2.psi> --expression-file <Cond1_expression-file> <Cond2_expression-file> --area <1000> --lower-bound <0.05> -pa -gc -c -o <output-file>
python3 suppa2.py diffSplice --method <empirical> --input <ioi-file> --psi <Cond1.psi> <Cond2.psi> --expression-file <Cond1_expression-file> <Cond2_expression-file> --area <1000> --lower-bound <0.05> -pa -gc -c -o <output-file>
```

### **Differential splicing with local events** ###

An example of the usage of the program with local events, applying a multple testing correction (-gc):

```
python3.4 suppa.py diffSplice --method <empirical> --input <ioe-file> --psi <Cond1.psi> <Cond2.psi> --expression-file <Cond1_expression-file> <Cond2_expression-file> --area <1000> --lower-bound <0.05> -gc -o <output-file>
python3 suppa2.py diffSplice --method <empirical> --input <ioe-file> --psi <Cond1.psi> <Cond2.psi> --expression-file <Cond1_expression-file> <Cond2_expression-file> --area <1000> --lower-bound <0.05> -gc -o <output-file>
```

### **Output files** ###
Expand Down Expand Up @@ -661,7 +690,7 @@ event3 <psi_value> <psi_value> <psi_value> <psi_value> <psi_value> <psi_value
SUPPA will use the psivec file to cluster events according to the PSI values across samples using those events that show significant change in at least one pairwise comparison using the dpsi file. Two methods are available: DBSCAN and OPTICS. Both methods require as input the minimum number of events in a cluster. OPTICS also requires as the maximum reachability distance (s), which represents the maximum distance in PSI space of an event to a cluster To perform the clustering from the *dpsi* and the *psivec* one has to run the following command:

```
python3.4 suppa.py clusterEvents [options]
python3 suppa2.py clusterEvents [options]

```
List of options available:
Expand Down Expand Up @@ -693,7 +722,7 @@ List of options available:
An example of the usage of the program is:

```
python3.4 suppa.py clusterEvents --dpsi <dpsi-file> --psivec <psivec-file> --sig-threshold <0.05> --eps <0.05> --min-pts <20> --groups <1-3,4-6> -o <output-file>
python3 suppa2.py clusterEvents --dpsi <dpsi-file> --psivec <psivec-file> --sig-threshold <0.05> --eps <0.05> --min-pts <20> --groups <1-3,4-6> -o <output-file>

```

Expand Down
Binary file removed __pycache__/eventClusterer.cpython-35.pyc
Binary file not shown.
Binary file removed __pycache__/eventGenerator.cpython-35.pyc
Binary file not shown.
Binary file removed __pycache__/fileMerger.cpython-35.pyc
Binary file not shown.
Binary file removed __pycache__/psiCalculator.cpython-35.pyc
Binary file not shown.
Binary file removed __pycache__/psiPerGene.cpython-35.pyc
Binary file not shown.
Binary file removed __pycache__/significanceCalculator.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/__init__.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/cluster_tools.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/diff_tools.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/event.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/gtf_store.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/optics.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/tools.cpython-35.pyc
Binary file not shown.
Binary file removed lib/__pycache__/var_event.cpython-35.pyc
Binary file not shown.
2 changes: 2 additions & 0 deletions scripts/Volcano_MA_plot.R → scripts/suppa2_Volcano_MA_plot.R
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#! /usr/bin/env Rscript

library(scales)
library(ggplot2)
library(ggrepel)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#!/usr/bin/Rscript
#! /usr/bin/env Rscript

#with this script, running from bash, we want to format the ids fo the Ensembl transcripts for running SUPPA

# Parse command line arguments
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#! /usr/bin/env python3

"""
@authors: Juan L. Trincado
@email: [email protected]
Expand Down
4 changes: 3 additions & 1 deletion scripts/generate_boxplot_event.py → scripts/suppa2_generate_boxplot_event.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#! /usr/bin/env python3

# The next script will format a phenotype table (junctions, events, trasncripts...)
# for runnning FastQTL analysis

Expand Down Expand Up @@ -159,4 +161,4 @@ def main():


if __name__ == '__main__':
main()
main()
2 changes: 2 additions & 0 deletions multipleFieldSelection.py → scripts/suppa2_multipleFieldSelection.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#! /usr/bin/env python3

# -*- coding: utf-8 -*-
"""
Created on Thu May 22 11:24:33 2014
Expand Down
8 changes: 6 additions & 2 deletions scripts/split_file.R → scripts/suppa2_split_file.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/soft/R/R-3.1.0/bin/Rscript
#! /usr/bin/env Rscript
#Given two pairs of lists of samples, split [1] in two files with the samples indicated in [2] and [3]

#[1] First argument: input file that we want to split
Expand All @@ -7,13 +7,17 @@
#[4] Fourth argument: output file of the first condition
#[5] Fifth argument: output file of the second condition

## bug in setuptools, this script is recognized as python if not included
## bugs #355, #1178
data.frame(a="")$a

# Parse command line arguments
print("Parsing samples...")
CHARACTER_command_args <- commandArgs(trailingOnly=TRUE)

#Load the input file
print(paste0("Loading ",CHARACTER_command_args[1],"..."))
input_file <- read.table(CHARACTER_command_args[1],header=TRUE)
input_file <- read.table(CHARACTER_command_args[1],row.names=1,header=TRUE, check.names=FALSE)

#Load the list of samples of the first condition
first_condition <- unlist(strsplit(CHARACTER_command_args[2],","))
Expand Down
36 changes: 36 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#! /usr/bin/env python3

from setuptools import setup, find_packages

setup(
name='SUPPA',
packages=find_packages(),
scripts=['suppa2.py',
'scripts/suppa2_multipleFieldSelection.py',
'scripts/suppa2_format_Ensembl_ids.R',
'scripts/suppa2_split_file.R',
'scripts/suppa2_Volcano_MA_plot.R',
'scripts/suppa2_generate_boxplot_event.py',
'scripts/suppa2_format_unique_fasta_RefSeq_annotation.py'],
version='2.3',
description='A tool to study splicing across multiple conditions at high speed and accuracy.',
author='GP Alamancos',
author_email='[email protected]',
license='MIT',
url='https://github.com/comprna/SUPPA',
download_url='https://github.com/comprna/SUPPA/archive/v2.3.tar.gz',
keywords=['alternative', 'splicing', 'analysis', 'transcriptomics'],
classifiers=[
'Development Status :: 5 - Production/Stable',
'Topic :: Scientific/Engineering :: Bio-Informatics',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: MIT License',
'Operating System :: POSIX :: Linux',
'Programming Language :: Python :: 3.5'],
install_requires=['scipy>=0.15.1',
'numpy>=1.11.0',
'pandas>=0.18.0',
'statsmodels>=0.6.1',
'scikit-learn>=0.16.1'],
python_requires='>=3',
)
File renamed without changes.
2 changes: 1 addition & 1 deletion lib/cluster_tools.py → suppa/cluster_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from sklearn.cluster import DBSCAN
from collections import defaultdict
from sklearn.metrics import silhouette_score
from lib.optics import *
from .optics import *



Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion lib/event.py → suppa/event.py
Original file line number Diff line number Diff line change
Expand Up @@ -607,7 +607,7 @@ def process_events(my_gene, event, ioe_writer, gtf_writer, edge_len, th):
gtf_writer.write(gtf_line, etype)

# to avoid circular dependencies
from lib.var_event import *
from .var_event import *


def create_event_classes(all_events, b_type):
Expand Down
2 changes: 1 addition & 1 deletion eventClusterer.py → suppa/eventClusterer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

import os
import logging
from lib.cluster_tools import cluster_analysis
from .cluster_tools import cluster_analysis
from argparse import ArgumentParser, RawTextHelpFormatter


Expand Down
6 changes: 3 additions & 3 deletions eventGenerator.py → suppa/eventGenerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@
import sys
import logging
from argparse import ArgumentParser, RawTextHelpFormatter
from lib.tools import *
from lib.gtf_store import *
from lib.event import *
from .tools import *
from .gtf_store import *
from .event import *

# Setting argument parser
# parser = argparse.ArgumentParser()
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions psiCalculator.py → suppa/psiCalculator.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import logging
import numpy as np
from argparse import ArgumentParser, RawTextHelpFormatter
from lib.tools import *
from .tools import *


description = \
Expand Down Expand Up @@ -158,7 +158,7 @@ def main():
writer = Writer.getWriter("PSI")
logger.info("Generating output %s" % (output_file + ".psi"))
writer.openFile(output_file)
writer.writeLine("\t".join(col_ids), False)
writer.writeLine("\t".join(["Name"] + col_ids), False)
for key, value in sorted(psi_dictionary.items()):
logger.debug("Calculating psi for %s" % key)
psi_line = PsiWriter.lineGenerator(key, value, col_ids)
Expand All @@ -183,4 +183,4 @@ def main():
logger.info("Done")

if __name__ == '__main__':
main()
main()
4 changes: 2 additions & 2 deletions psiPerGene.py → suppa/psiPerGene.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
import sys
import logging
from argparse import ArgumentParser, RawTextHelpFormatter
from lib.tools import *
from lib.gtf_store import *
from .tools import *
from .gtf_store import *


description = \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

import os
import logging
from lib.diff_tools import multiple_conditions_analysis
from .diff_tools import multiple_conditions_analysis
from argparse import *


Expand Down
6 changes: 3 additions & 3 deletions lib/tools.py → suppa/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import sys
import logging
from abc import ABCMeta, abstractmethod
from lib.event import *
from .event import *


#Setting logging preferences
Expand Down Expand Up @@ -376,9 +376,9 @@ def readLine(self, header = True):
fields = line.rstrip("\n").split("\t")
if lineNumber == 0 and header: #Skip the header line
#Calculating the number of fields required
min_fields = (len(line.rstrip("\n").split("\t")) + 1)
min_fields = len(fields)
#Storing column_id for the expression fields
colIds = fields
colIds = fields[1:]
continue
if line.startswith('#'):
logger.debug("Line %i starts with #. Skipping line..." % (
Expand Down
Loading