These are datasets used by P2Rank ligand binding site prediction tool for training and evaluation.
Each *.ds
file contains a list of items that form a dataset with actual data being stored in subdirectories.
Please note that *.ds
files that define the datasets from our papers may contain only subsets of PDB files in individual directories. For example holo4k/
directory contains 4543 pdb files but holo4k.ds
contains 4009 lines. For reproducibility, 4009 is the correct number of proteins in the HOLO4K dataset used in P2Rank/PrankWeb papers.
Main sets of proteins:
- CHEN11: a dataset of 251 proteins harboring 476 ligands introduced in LBS prediction benchmarking study
- ASTEX: Astex Diverse set
- metapocket2 datasets
- U/B48: Datasets that contain a set of 48 proteins in a bound and unbound state
- DT198: a dataset of 198 drug-target complexes
- B210: a benchmarking dataset of 210 proteins in a bound state
- FPTRAIN: dataset used by Fpocket for training its pocket scoring function
- HOLO4K: a large dataset of protein-ligand complexes. Contains larger multi-chain structures downloaded directly from PDB. Disjunct with CHEN11 and JOINED.
- "standard" ... 1 column of liganated proteins
*(mlig)*
datasets: datasets that contain explicitly specified relevant ligands. Valid ligand codes come from MOAD 2013 database. Proteins unknown to MOAD and proteins with conflicting ligand codes (valid&invalid) were removed.- datasets with predictions: include predictions by other ligand binding site prediction methods
(
-fpocket.ds
,-sitehound.ds
, etc. suffixes) *-XXsubset-*
datasets: contain a subset of the original datasets for which the given method finished successfully and produced predictions (mp
: MetaPocket2,sh
: SiteHound,ds
: DeepSite)
This repository also contains binding site predictions produced by some other methods.
- Fpocket
- used version: v1.0 with default parameters
- SiteHound
- used version: version labeled as
- command used to generate predictions:
ls *.pdb | xargs -i python ../auto.py -i {} -p CMET -k
(executed in a directory with pdb files) - default probe and parameters were used
- MetaPocket 2.0
- obtained from MetaPocket 2.0 web server by a Python script in Fall 2017 using default parameters
- DeepSite
- obtained from DeepSite web server by a Python script in Fall 2017 using default parameters
- P2Rank
- correspond to P2Rank 2.0 with default parameters
- 1xgf.pdb removed from holo4k datasets (all UNK groups, no ligands)