GitHub - Limsande/PRosettaC: PROTAC (PROteolysis TArgeting Chimeras) modelling on a PBS or Open Grid Engine cluster

Limsande / PRosettaC Public

forked from LondonLab/PRosettaC

Notifications You must be signed in to change notification settings
Fork 0
Star 0

PROTAC (PROteolysis TArgeting Chimeras) modelling on a PBS or Open Grid Engine cluster

MIT license

0 stars 17 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Rosetta		Rosetta
cluster		cluster
LICENSE		LICENSE
README		README
auto.py		auto.py
clustering.py		clustering.py
constraint_generation.py		constraint_generation.py
extended.py		extended.py
main.py		main.py
protac_lib.py		protac_lib.py
pymol_utils.py		pymol_utils.py
rosetta.py		rosetta.py
short.py		short.py
utils.py		utils.py

Repository files navigation

Installation requirements for PRosettaC:

1. Load python 3 by using:
module load python/3.6.4

2. Install RDKit, NumPy and pymol python packages. If you're using conda, use:
conda install -c rdkit rdkit
conda install -c anaconda numpy
conda install -c schrodinger pymol

3. Download and Install OpenBabel (http://openbabel.org/wiki/Main_Page), PatchDock (https://bioinfo3d.cs.tau.ac.il/PatchDock/) and Rosetta (https://www.rosettacommons.org/). The Rosetta versino that was used in this study is: 2019.27.post.dev+132.master.966c9eb 966c9eb6b3ab993de7aa3af5988125b7c2e464af [email protected]:RosettaCommons/main.git 2019-07-18T11:43:25.

4. For job scheduling our scripts rely on PBS (https://en.wikipedia.org/wiki/Portable_Batch_System) or Sun Grid Engine/Open Grid Engine (https://en.wikipedia.org/wiki/Oracle_Grid_Engine).

5. Set the following environment parameters: export PATCHDOCK=”/Path_to_PatchDock/”, export OB=”/Path_to_OpenBabel/”, export SCRIPTS_FOL=”/Path_to_the_git_folder/”, export ROSETTA_FOL=”/Path_to_Rosetta/”. Additionally, set either PBS_HOME=”/Path_to_PBS_bin_folder/”, or SGE_HOME=”/Path_to_SGE_bin_folder/”, depending on the scheduler you are using.

6. When working with PBS, the environment parameter PBS_O_WORKDIR=/Path_to_the_git_folder/Patchdok_Results has to be set whereever a job is executed. You can achieve this by appending

export PBS_O_WORKDIR=/Path_to_the_git_folder/Patchdok_Results

to /etc/pbs.conf (on all cluster nodes). Another option is to write this line to a file, e.g. env.txt, and then set the enrionment parameter SCHEDULER_PARAMS=/Path_to_env.txt before starting PRosettaC (see below "Additional parameters").
When using SGE, do the same except with /etc/sge.conf.

Usage: python main.py/auto.py Protac_params.txt

In order to send the main job also to the job scheduler, use:
python extended.py/short.py Protac_params.txt

Protac_param.txt will include the input files and should look like this:

/////////////////////////////
For main.py / extended.py:
Structures: StructA.pdb StructB.pdb
Chains: A B
Heads: HeadA.sdf HeadB.sdf
Anchor atoms: 11 23
Protac: protac.smi
Full: True
////////////////////////////

Details of files for the extended version:
StructA.pdb is the structure of the E3 ligase (e.g. CRBN)
StructB.pdb is the structure of the target protein (e.g. BRD4)
A and B are the chain IDs for the E3 ligase and target. If your structures have more than one chain, write them like this: Chains: AC B. There should not be identical chain IDs for the E3 ligase and the target.
HeadA.sdf is the E3 ligase binder (e.g. thalidomide)
HeadB.sdf is the target binder (e.g. JQ1)
Both HeadA and HeadB should be positioned in their appropriate binding modes within StructA and StructB.
Anchor atoms is the number of the anchor atoms within HeadA and HeadB. An easy way to find the number of a desired atom is to open HeadA.sdf in pymol, press the L button in the toolbar, then choose "atom identifiers", and "ID". Just make sure that the anchor should be >= 1, so in case the numbering in pymol starts from 0, add one to the number of the chosen atom. The anchor atoms must be uniquely defined in the smiles representation of the molecule.

/////////////////////////////
For auto.py / short.py:
PDB: PDB_ID1 PDB_ID2
LIG: LIG_ID1 LIG_ID2
PROTAC: protac.smi
Full: True
/////////////////////////////

For a more flexible run with .pdb or .sdf file instead of the PDB or LIG IDs, for either one or both proteins, use auto.py / short.py:

/////////////////////////////
PDB: [PDB_ID1 ; pdb1.pdb] [PDB_ID2 ; pdb2.pdb]
LIG: [LIG_ID1 ; lig1.sdf] [LIG_ID2 ; lig2.sdf]
PROTAC: [SMILES_STRING ; protac.smi]
Full: True
/////////////////////////////

NOTES:
[X ; Y] means that either one could be chosen. The only requirement is that .sdf file cannot be chosen along with PDB_ID.
protac.smi is a file with a single line which is the smiles representation of the full PROTAC.
Due to a reserved name, the params file must not be called params.txt.
Use "Full: True" for a full and long run, and "Full: False" for a shorter run of lower quality, with less global and local docking runs
Use "ClusterName: [PBS|SGE]" to set the scheduler (if you use PBS you don't need to specify it at all, because that's the default)

Additional parameters:
In the config file you can also specify the optional parameters RosettaDockMemory and ProtacModelMemory. These set the
amount of memory assigned per job at step 3 and 4 described in the paper and default to 8000mb and 4000mb, respectively.

You can also set the environment variable SCHEDULER_PARAMS to a file, whose content is prepended to each job file sent to the scheduler. This is useful for scheduler-specific job-level configuration (e.g. number of dedicated cpus) or for setup like activating a conda environement to run the job in etc.

How to implement a scheduler
To add support for a specific scheduler simply implement the abstract class Cluster in cluster/Cluster.py like cluster.PBS.PBS and cluster.SGE.SGE do. Then add the scheduler as entry in cluster/__init__.py.

Authors:
Daniel Zaidman, Nir London