HADDOCK2.4 manual - Frequently Asked Questions

We collect here frequently occurring problems and solutions. The following topics are currently available:

What about missing atoms or chain breaks?
What about point mutations?
What about ions?
Domain definition for docking
Clustering issues
Increasing the number of flexible segments
Running in batch mode without a queuing system
Running HADDOCK on a cluster using a queuing system (e.g. Torque or Slurm)
Small ligand docking with HADDOCK

What about missing atoms or chain breaks?

Missing atoms will be automatically generated by HADDOCK when generating the topologies and PDB files of the molecules in the begin directory. This is done when running the generate.inp CNS scripts. In case of missing residues, chain breaks will be introduced. This might cause segments of your molecule to move with respect to each other during the refinement stage. To avoid that you can define a few specific distance restraints, for example between CA atom. Our haddock-tools (https://github.com/haddocking/haddock-tools ) contain a script to detect such breaks and define distance restraints (restrain_bodies.py*). Those restraints can be given to HADDOCK as unambiguous restraints for example. The HADDOCK server does that automatically.

What about point mutations?

To introduce mutations in your input PDB files you can do the following:

edit the PDB file and rename the mutated residue to the proper amino acid name
keep or rename appropriately the matching side-chain atoms

The missing atoms will be generated automatically. It is important to have at least the backbone atoms and at least the CB atom along the side-chain defined since their average position will be used as starting point to “grow” the missing atoms. Always check that the sequence of the various PDB files match!

What about ions?

Some proteins contains ions such as for example calcium. Their inclusion might be important for docking purposes, in particular for proper electrostatics! In principle, they should be recognized in the topology generation step provided their name in the PDB file matches the ion names defined in the ion.top file in the toppar directory. The list of supported ions can be found here.

Domain definition for docking

In general, it is recommended to remove any part of your system such as flexible linkers that are not involved in the interaction with the partner for docking. Keeping these might give trouble in the sorting of solutions. For example, such a linker can make contacts with the partner molecule, resulting in a lower total energy and, in that way, “bad” solutions could still be kept.

Clustering issues

When choosing which of the two molecules will be in the first segid (e.g. “A”), it is recommended to choose the largest and/or most rigid one of the two. This should give better clustering results when RMSD clustering is selected since in the rmsd calculation for clustering the structures are first fitted on the semi-flexible segments of the first molecule and then the rmsds are calculated on the semi-flexible segments of the remaining molecules (defined as “ligand interface RMSD”).

Defining the largest and best defined (most rigid) molecule first should thus result in a better fitting.

Note that this is not an issue if fractions of common contact (FCC) clustering is used.

Increasing the number of flexible segments

In the current distribution, the number of flexible segment is set to 10 for the semi-flexible interface and 5 for the fully flexible segments. If needed, these numbers can be increased. The only file that you will need to modify is run.cns.

As an example, say you wish to increase the number of fully flexible segments for molecule A to 10. Locate in run.cns the section concerning the fully flexible segments, e.g.:

{=========== Definition of fully flexible segments ==========}
{* Define the fully flexible segment of each molecule.*}
{* These segments will be allowed to move at all stages of it1 *}

{* Number of fully flexible segments for molecule 1                  *}
{* Note that current max is 5 (edit the run.cns to add more segments *}

{===>} nfle_1=0;

{* Fully flexible segments of molecule 1 *}

{===>} start_fle_1_1="";
{===>} end_fle_1_1="";
{===>} start_fle_1_2="";
{===>} end_fle_1_2="";
{===>} start_fle_1_3="";
{===>} end_fle_1_3="";
{===>} start_fle_1_4="";
{===>} end_fle_1_4="";
{===>} start_fle_1_5="";
{===>} end_fle_1_5="";

Increase the number of rows to match the number of segments you want to define, e.g for 10:

{=========== Definition of fully flexible segments ==========}
{* Define the fully flexible segment of each molecule.*}
{* These segments will be allowed to move at all stages of it1 *}

{* Number of fully flexible segments for molecule (protein) A            *}
{* Note that current max is 5 (edit the run.cns to add more segments     *}

{===>} nfle_A=0;

{* Fully flexible segments of molecule (protein) A *}
{+ table: rows=5 "segment 1" "segment 2" "segment 3" "segment 4" "segment 5" cols=2 "Start residue" "End residue" +}

{===>} A_start_fle_1="";
{===>} A_end_fle_1="";
{===>} A_start_fle_2="";
{===>} A_end_fle_2="";
{===>} A_start_fle_3="";
{===>} A_end_fle_3="";
{===>} A_start_fle_4="";
{===>} A_end_fle_4="";
{===>} A_start_fle_5="";
{===>} A_end_fle_5="";
{===>} A_start_fle_6="";
{===>} A_end_fle_6="";
{===>} A_start_fle_7="";
{===>} A_end_fle_7="";
{===>} A_start_fle_8="";
{===>} A_end_fle_8="";
{===>} A_start_fle_9="";
{===>} A_end_fle_9="";
{===>} A_start_fle_10="";
{===>} A_end_fle_10="";

Running in batch mode without a queuing system

A contribution from Melissa Stauffer, Vanderbilt University, USA

Here is a description of the setup on a Linux cluster that has no queuing system installed. It has a head node/firewall with 16 dual-cpu compute nodes behind it. From the head node, processes can be spawned to the compute nodes using rsh. Authentication to compute nodes is done via the /etc/hosts.equiv mechanism - no prompting for passwords. HADDOCK was setup to use each of the 32 compute-node CPUs by creating a simple csh wrapper script called “haddock_wrapper” that contains two lines like this:

cd _project_directory_
csh -f $1

Then, in the run.cns file, we define our “queues” like:

{===>} queue_1="rsh node1 ~/bin/haddock_wrapper";
{===>} cns_exe_1="cns";
{===>} cpunumber_1=2;

{===>} queue_2="rsh node1 ~/bin/haddock_wrapper";
{===>} cns_exe_2="cns";
{===>} cpunumber_2=2;

...

etc - one “queue” for each compute node to be included in this haddock run.

If rsh is blocked, you can use instead ssh, but for this to work you need to set up ssh such as no password is needed. See for that:
https://xeny.net/Passwordless_S_S_H
(Thanks to Andrea Spitaleri)

Running HADDOCK on a cluster using a queuing system (e.g. Torque or Slurm)

In order to submit to the queuing system we use typically a wrapper script that will add some directives to the job files. Here is one example of such a wrapper script called ssub that would submit to the default queue:

#!/bin/csh -f
if ($#argv < 1) then
  echo "Usage : ssub jobname"
  exit 1
endif

# check if job exists + make it executable
set jobname=$1
if (! -e $1) then
  echo "job file does not exist"
  exit 1
endif
if (! -x $jobname) chmod +x $jobname

# write temporary pbs script
set pbsjob=$jobname.pbsjob.$
if (! -e $pbsjob) then
  touch $pbsjob
else
  \rm $pbsjob
  touch $pbsjob
endif
set PWD=`pwd`

echo "#PBS -S" $SHELL >> $pbsjob
set outfile=$PWD/$jobname.out.$ >> $pbsjob
echo "#PBS -o $outfile" >> $pbsjob
set errorfile=$PWD/$jobname.err.$ >> $pbsjob
echo "#PBS -e $errorfile" >> $pbsjob
echo "#PBS -m n" >> $pbsjob
echo "cd $PWD" >> $pbsjob
echo "./$jobname" >>$pbsjob

chmod +x $pbsjob
qsub -j eo $pbsjob
rm -rf $pbsjob
exit

Change your run.cns script (here we assume ssub is in your path, otherwise give the full path to it):

{===>} queue_1="ssub";
{===>} cns_exe_1="/home/software/software/cns_solve_1.31-UU/intel-x86_64bit-linux/bin/cns";
{===>} cpunumber_1=100;

This will cause HADDOCK to submit 100 jobs simultaneously to the batch system. Once a structure comes back the next job will be submitted. You can adapt the wrapper script to submit to specific queues.

Note that since the rigid-body docking jobs (it0) are usually very fast, it is possible to bundle a number of those into a single job to avoid overloading the batch system. For this edit in your HADDOCK installation the following file: Haddock/Main/MHaddock.py and change the value of jobconcat[“0”]. In the following example, 5 docking jobs would be concatenated into one job for it0:

jobconcat["0"] = 5
jobconcat["1"] = 1
jobconcat["2"] = 1

Small ligand docking with HADDOCK

It’s possible to dock small ligands using HADDOCK but for that topology and parameter files for the ligand should be provided in CNS format. Several sources exist to find such files:

the PRODGR server maintained by Daan van Aalten at Dundee University: https://prodrg2.dyndns.org
This server allows you to draw your molecule or paste coordinates and will return topologies and parameter files in various format, including CNS. You should turn on the electrostatic to obtain partial charges. Save the resulting PDB file and the corresponding CNS parameter and topology files to use in HADDOCK.

Important: The generated parameter file contains a CNS NBONds statement which should be removed prior to use in HADDOCK. Look in the parameter file for:
```
     NBONds
       CUTNB=7.0 WMIN=1.5
       REPEL=1.0 REXPONENT=4
       IREXPONENT=1 RCONST=16.0
       TOLERANCE=0.5 NBXMOD=5
       CTONNB=5.5 CTOFNB=6.0
     END
```
and remove or comment it out (by adding ! before each line).
the Automated Topology Builder (ATB) and Repository developed in Prof. Alan Mark’s group at the University of Queensland in Brisbane: https://compbio.biosci.uq.edu.au/atb
Note: we did not yet test those parameters in HADDOCK.

For docking small ligand with HADDOCK using custom-made topology and parameter files and perform the following steps:

Setup your HADDOCK run in the usual way, i.e. generating the new.html file and running haddock first to generate the run directory structure.
Place your custom topology and parameter files in the toppar directory.
When modifying the docking parameters in run.cns, specify the proper topology and parameter files for your ligand. Or alternatively, place the topology and parameter files of your ligand in the toppar/ligand.top and toppar/ligand.param files, respectively.
To avoid that a N- or C-terminal patch be applied to your ligand, add “first IONS” and “last IONS” statements with the name of your ligand in the topallhdg5.3.pep file in the toppar directory (look for the “first IONS” and “last IONS” statements)

Also we recommend to set the number of MD steps for the first two parts (rigid-body high temperature dynamic and slow cooling annealing) of the semi-flexible refinement to 0.

HADDOCK2.4 comes with an example for protein-ligand docking. Check the setting in that example.

Important: When starting a run, always check for error messages in the begin directory in the various generate…out files, especially for your ligand.