This repo provides examples of performing virtual screening related work. The idea is to provide well defined and well documented examples that can be run by third parties, and to define how to produce a toolset that allows these workflows to be readily executed.
A key related project is Pipelines that defines a number of components (currently mostly based on Python and RDKit).
A key part of the strategy is perform the execution in Docker containers so that you do NOT need to install lots of different tools on your host machine. Currently the only tools you need installed are:
- Docker or Singularity
- Nextflow
- Java (neded by Nextflow).
Many of the Docker images can be found on the Informatics Matters Docker Hub repository.
The majority of this work uses Docker containers and the scripts are written for Docker. However most if not all should also work with Singularity. To run with Singularity first create the necessary Singularity images from the Docker images using a command like this:
singularity pull docker://informaticsmatters/rdock:latest
Then convert the docker run
commands to singularity exec
commands. The current directory is automatically
mounted and the process runs as the current user so the command is a little simpler. For example, for a
Docker command like this:
docker run -it --rm -v $PWD:/work:z -w /work -u $(id -u):$(id -g) informaticsmatters/rdock-mini:latest rbcavity -r 1sj0_rdock.prm -was
you would need a Singularity command like this:
singularity exec ~/rdock-mini_latest.sif rbcavity -r 1sj0_rdock.prm -was
When running with Nextflow the Docker images defined in a workflow are automatically pulled and converted
to Singlularity. You might want to set the NXF_SINGULARITY_CACHEDIR
environment variable
to define where Nextflow places the Singularity images so that you do not end up with copies in every 'project'.
This is an upstream project for the Squonk computational notebook, as is Pipelines. The aim is that these workflows are generated in a way that makes them easy to integrated into Squonk. As such it provides a playground where new methodologies can be developed and benchmarked.
Inlcuded in this repo are a number of public datasets that are useful for testing and validation studies. You can find them in the datasets directory. Feel free to contribute additional datasets, but if doing so please include documentation describing the source of the dataset and attribute ownership appropriately.
- CDK2 virtual screening with rDock
- Docking validation using rDock and Smina using DEKIOS data: DHFR - also for CDK2, SARS-COV, Thrombin
- Generating ROC curves
- Docking pose validation for ESR
- rDock setup for use in Squonk
- Protein selection for docking
We welcome contributions, but want to make sure they follow a well defined set of pattens and conventions. Unfortunately these are still being established.
We will insist on all examples being well documented.
Contact Tim Dudgeon <tdudgeon at informaticsmatters dot com> if you want to get involved.