This repository contains the source code to reproduce the results of the paper "FoldHsphere: Deep Hyperspherical Embeddings for Protein Fold Recognition" (see citation below).
Input data, features and trained models can be found at http://sigmat.ugr.es/~amelia/FoldHSphere/.
./Run_train_test.sh
- Train the ResCNN-BGRU model using Thomson-LMCL approach:
TRAINFILE="data/train/train.list"
FOLDLABELFILE="data/train/fold_label_relation_1154.txt"
FEATSDIR="features/train"
MODELDIR="models/ResCNN-BGRU/tanh_thomson_lmcl_m06_s30"; mkdir -p $MODELDIR
CENTDIR="models/prototypes_thomson/optim_rescnn-bgru-softmax/thl_sum"
python scripts/main_lightning.py --phase="train" \
--train_file=${TRAINFILE} --fold_label_file=${FOLDLABELFILE} \
--feats_dir=${FEATSDIR} --model_dir=${MODELDIR} \
--model_type="rescnn_gru" --loss_type="lmcl_fixed" \
--centroids_file="${CENTDIR}/prototypes_ep1020.npy" \
--input_dim=45 --channel_dims="64_256_64_256" --kernel_sizes="5_5" \
--gru_dim=1024 --gru_bidirec=True --hidden_dims="512" \
--drop_prob=0.2 --activation_last="tanh" --batch_norm=False \
--batch_size_class=64 --loss_margin=0.6 --loss_scale=30 \
--ndata_workers=2
- Extract embeddings for the LINDAHL dataset using the ResCNN-BGRU pre-trained model:
TESTFILE="data/lindahl/lindahl.list"
FEATSDIRTEST="features/lindahl"
python scripts/main_lightning.py --phase="extract" \
--test_file=${TESTFILE} --scop_separation="_" \
--feats_dir_test=${FEATSDIRTEST} --model_dir=${MODELDIR} \
--model_file="${MODELDIR}/checkpoint/model_epoch80.ckpt" \
--model_type="rescnn_gru" --loss_type="lmcl_fixed" \
--centroids_file="${CENTDIR}/prototypes_ep1020.npy" \
--input_dim=45 --channel_dims="64_256_64_256" --kernel_sizes="5_5" \
--gru_dim=1024 --gru_bidirec=True --hidden_dims="512" \
--drop_prob=0.2 --activation_last="tanh" --batch_norm=False \
--ndata_workers=2
- Compute cosine similarity scores and evaluate:
EMBEDFILE="${MODELDIR}/embeddings/lindahl.pkl"
SCORESDIR="${MODELDIR}/scores"
./Run_eval_pairs_cosine.sh "lindahl" ${EMBEDFILE} ${SCORESDIR}
./Run_random_forest.sh "lindahl" 4
- Python 3.7.7
- Numpy 1.19.0
- Scikit-Learn 0.23.1
- Matplotlib 3.2.2
- PyTorch 1.4.0
- Tensorboard 2.2.0
- PyTorch-Lightning 0.10.0
Villegas-Morcillo, A., Sanchez, V. & Gomez, A.M. FoldHSphere: deep hyperspherical embeddings for protein fold recognition. BMC Bioinformatics 22, 490 (2021). https://doi.org/10.1186/s12859-021-04419-7
BibTex:
@article{villegas2021fold,
author = {Villegas-Morcillo, Amelia and Sanchez, Victoria and Gomez, Angel M.},
title = {FoldHSphere: deep hyperspherical embeddings for protein fold recognition},
journal = {BMC Bioinformatics},
year = {2021},
month = {Oct},
volume = {22},
number = {1},
pages = {490},
doi = {10.1186/s12859-021-04419-7}
}