DeepCIP is a Deep learning method for CircRNA IRES Prediction.
DeepCIP needs to run on a Linux operating system (e.g. Debian 11.3) with the following software installed.
- Python3.8
- Perl (Recommended v5.32.1)
- Anaconda
RNAplfold from ViennaRNA version 2.5.1 is required to predict RNA secondary structure in DeepCIP. You need to install the ViennaRNA package before you start to use DeepCIP.
First, download the ViennaRNA package from ViennaRNA-2.5.1.tar.gz .
Then, install ViennaRNA package:
tar -zxvf ViennaRNA-2.5.1.tar.gz
cd ViennaRNA-2.5.1
./configure
make
make install
For more details, see https://github.com/ViennaRNA/ViennaRNA
DeepCIP relies on Seqkit to process RNA sequences (eg. subsequence segmentation).
You can download Seqkit from https://bioinf.shenwei.me/seqkit/download/
Reference: W Shen, S Le, Y Li*, F Hu*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLOS ONE. doi:10.1371/journal.pone.0163962.
First, download the repository and create the environment.
git clone https://github.com/zjupgx/DeepCIP.git
cd ./DeepCIP
conda env create -f environment.yml
Then, activate the "DeepCIP_pytorch" environment.
conda activate DeepCIP_pytorch
The file prepared for prediction should be put into folder ./data. Then, type the following command to start the prediction, with parameters set according to your needs.
For example:
python DeepCIP.py -n example -i circires_case.fa
For more options:
python DeepCIP.py --help
usage: DeepCIP.py [-h] -n -i [-b] [-c] [-m] [-w] [-s] [-r ]
optional arguments:
-h, --help show this help message and exit
-n , --data_name The name of your input dataset.
-i , --input_file Input file for prediction. (*.fasta or *.fa file)
-b , --batch_size Batch size. (default=16) "--bs 16" means every 32 sampels constitute a prediction batch. This parameter affects the speed and results of the prediction. The larger the batch size, the faster the prediction, as far as your machine allows. (If the lengths of your input sequences vary greatly, it is recommended that you do not use a large batch size, or you can put sequences of similar lengths together for prediction)
-c , --cut_off Prediction threshold. (default=0.5)
-m , --mode The mode of prediction. (default=0) mode 0: Prediction directly on the input sequence. mode 1: The input sequence is partitioned by length w and interval s, and then the partitioned sequence is predicted. (w and s can be set by --w and --s, respectively)
mode 2:
-w , --window_size window size (default=174). See --mode description. It can be ignore when mode not is 1.
-s , --step step (default=50). See --mode description. It can be ignore when mode not is 1.
-r , --region region of circRNA detection. e.g -r 1 12 for first 12 bases, -r -12 -1 for last 12 bases, -r 13 -1 for cutting first 12 bases. See --mode description. It can be ignore when mode not is 2.
The result will be saved as a csv format file.
Raw data used in our study can be avaliable in GSE178718_Oligo_eGFP_expression and 55k_oligos