HADDOCK2.2 manual

FAQ

We will collect here frequently occorring problems and solutions. The following topics are currently available:

What about missing atoms or point mutations?
What about ions?
Domain definition for docking
Clustering issues
Increasing the number of flexible segments
Running in batch mode without a queuing system
Running in batch mode using a queuing system
Small ligands docking with HADDOCK

What about missing atoms or point mutations?

Missing atoms will be automatically generated by HADDOCK when generating the topologies and PDB files of the molecules in the begin directory. This is done when running the generate_X.inp CNS scripts (these are originally based on the generate_easy.inp script from the CNS distribution). HADDOCK uses the PDB files defined in the new.html file. In case of missing residues, chain breaks will be introduced.

When setting up the docking from an ensemble of structures it can happen that single point mutants coordinates files are available. These can be used as well provided you do the following:

edit the PDB file and rename the mutated residue to the proper amino acid name
keep or rename appropriately the matching side-chain atoms

The missing atoms will be generated automatically. It is important to have at least the backbone atoms defined since their average position will be used as starting point to "grow" the missing atoms. Always check that the sequence of the various PDB files match!

What about ions?

Some proteins contains ions such as for example calcium. Their inclusion might be important for docking purposes, in particular for proper electrostatics! In principle, they should be recognized in the topology generation step provided their name in the PDB file matches the ion names defined in the ion.top file in the toppar directory. To avoid that a N- or C-terminal patch be applied to them, they should also be defined in the topallhdg5.3.pep file (look for the "first IONS" and "last IONS" statements).

Another problem can occur with ions in torsion angle dynamics since they are unconnected single atoms. In the new HADDOCK 2.X version, for the torsion angle dynamics part of the docking protocol (it1), a covalent bond will be automatically defined to the closest ligand atom (only for cations). This is done in the covalions.cns CNS script in the protocols directory; the following cations are currently defined: MG⁺², CA⁺², FE⁺², FE⁺³, NI⁺², CO⁺², CO⁺³, CU⁺¹, CU⁺² and ZN⁺². If your system contains other ions add them to the covalions.cns file (they should however be defined in ion.top).

Domain definition for docking

In general, it is recommended to remove any part of your system such as flexible linkers that are not involved in the interaction with the partner for docking. Keeping these might give trouble in the sorting of solutions. For example, such a linker can make contacts with the partner molecule, resulting in a lower total energy and, in that way, "bad" solutions could still be kept.

Clustering issues

When choosing which of the two molecules will be in the first segid (e.g. "A"), it is recommended to choose the largest and/or most rigid one of the two. This should give better clustering results since in the rmsd calculation for clustering (rmsd.inp CNS script) the structures are first fitted on the semi-flexible segments of the first molecule and then the rmsds are calculated on the semi-flexible segments of the remaining molecules (defined as "ligand interface RMSD").

Defining the largest and best defined (most rigid) molecule first should thus result in a better fitting.
Note that this is not an issue if fractions of common contact (FCC) clustering is used.

Increasing the number of flexible segments

In the current distribution, the number of flexible segment is set to 10 for the semi-flexible interface and 5 for the fully flexible segments. If needed, these numbers can be increased. The only file that you will need to modify is run.cns.

As an example, say you wish to increase the number of fully flexible segments for molecule A to 10. Locate in run.cns the section concerning the fully flexible segments, e.g.:

{=========== Definition of fully flexible segments ==========}
{* Define the fully flexible segment of each molecule.*}
{* These segments will be allowed to move at all stages of it1 *}

{* Number of fully flexible segments for molecule (protein) A            *}
{* Note that current max is 5 (edit the run.cns to add more segments     *}

{===>} nfle_A=0; 

{* Fully flexible segments of molecule (protein) A *}
{+ table: rows=5 "segment 1" "segment 2" "segment 3" "segment 4" "segment 5" cols=2 "Start residue" "End residue" +}

{===>} A_start_fle_1=""; 
{===>} A_end_fle_1=""; 
{===>} A_start_fle_2=""; 
{===>} A_end_fle_2=""; 
{===>} A_start_fle_3=""; 
{===>} A_end_fle_3=""; 
{===>} A_start_fle_4=""; 
{===>} A_end_fle_4=""; 
{===>} A_start_fle_5=""; 
{===>} A_end_fle_5="";

Increase the number of rows in the table definition to 10, add the additional row headers and add the additional segment definitions. The result should look like:

{=========== Definition of fully flexible segments ==========}
{* Define the fully flexible segment of each molecule.*}
{* These segments will be allowed to move at all stages of it1 *}

{* Number of fully flexible segments for molecule (protein) A            *}
{* Note that current max is 5 (edit the run.cns to add more segments     *}

{===>} nfle_A=0; 

{* Fully flexible segments of molecule (protein) A *}
{+ table: rows=5 "segment 1" "segment 2" "segment 3" "segment 4" "segment 5" cols=2 "Start residue" "End residue" +}

{===>} A_start_fle_1=""; 
{===>} A_end_fle_1=""; 
{===>} A_start_fle_2=""; 
{===>} A_end_fle_2=""; 
{===>} A_start_fle_3=""; 
{===>} A_end_fle_3=""; 
{===>} A_start_fle_4=""; 
{===>} A_end_fle_4=""; 
{===>} A_start_fle_5=""; 
{===>} A_end_fle_5=""; 
{===>} A_start_fle_6=""; 
{===>} A_end_fle_6=""; 
{===>} A_start_fle_7=""; 
{===>} A_end_fle_7=""; 
{===>} A_start_fle_8=""; 
{===>} A_end_fle_8=""; 
{===>} A_start_fle_9=""; 
{===>} A_end_fle_9=""; 
{===>} A_start_fle_10=""; 
{===>} A_end_fle_10="";

You will also need to add additional lines further down in run.cns at the location where the segment definitions are stored in the toppar variable, i.e.:

evaluate (&toppar.A_start_fle_1=&A_start_fle_1)
evaluate (&toppar.A_start_fle_2=&A_start_fle_2)
evaluate (&toppar.A_start_fle_3=&A_start_fle_3)
evaluate (&toppar.A_start_fle_4=&A_start_fle_4)
evaluate (&toppar.A_start_fle_5=&A_start_fle_5)
evaluate (&toppar.A_start_fle_6=&A_start_fle_6)
evaluate (&toppar.A_start_fle_7=&A_start_fle_7)
evaluate (&toppar.A_start_fle_8=&A_start_fle_8)
evaluate (&toppar.A_start_fle_9=&A_start_fle_9)
evaluate (&toppar.A_start_fle_10=&A_start_fle_10)

Repeat this for protein B is needed or to increase the number of flexible segments for the interface definition.

Running in batch mode without a queuing system

A contribution from Melissa Stauffer, Vanderbilt University, USA

Here is a description of the setup on a Linux cluster that has no queuing system installed. It has a head node/firewall with 16 dual-cpu compute nodes behind it. From the head node, processes can be spawned to the compute nodes using rsh. Authentication to compute nodes is done via the /etc/hosts.equiv mechanism - no prompting for passwords. HADDOCK was setup to use each of the 32 compute-node CPUs by creating a simple csh wrapper script called "haddock_wrapper" that contains two lines like this:

cd project_directory
csh -f $1

Then, in the run.cns file, we define our "queues" like:

{===>} queue_1="rsh node1 ~/bin/haddock_wrapper";
{===>} cns_exe_1="cns";
{===>} cpunumber_1=2;

{===>} queue_2="rsh node1 ~/bin/haddock_wrapper";
{===>} cns_exe_2="cns";
{===>} cpunumber_2=2;

...

etc - one "queue" for each compute node to be included in this haddock run.

If rsh is blocked, you can use instead ssh, but for this to work you need to set up ssh such as no password is needed. See for that:
https://www.astro.caltech.edu/~mbonati/WIRC/manual/DATARED/setting_up_no-password_ssh.html
(Thanks to Andrea Spitaleri)

Running HADDOCK on a cluster using a queuing system (e.g. PBS or Torque)

In order to submit to the queuing system we use typically a wrapper script that will add some directives to the job files. Here is one example of such a wrapper script called ssub that would submit to the default queue:

#!/bin/csh -f
if ($#argv < 1) then
  echo "Usage : ssub jobname"
  exit 1
endif

# check if job exists + make it executable
set jobname=$1
if (! -e $1) then
  echo "job file does not exist"
  exit 1
endif
if (! -x $jobname) chmod +x $jobname

# write temporary pbs script
set pbsjob=$jobname.pbsjob.$$
if (! -e $pbsjob) then
  touch $pbsjob
else
  \rm $pbsjob
  touch $pbsjob
endif
set PWD=`pwd`

echo "#PBS -S" $SHELL >> $pbsjob
set outfile=$PWD/$jobname.out.$$ >> $pbsjob
echo "#PBS -o $outfile" >> $pbsjob
set errorfile=$PWD/$jobname.err.$$ >> $pbsjob
echo "#PBS -e $errorfile" >> $pbsjob
echo "#PBS -m n" >> $pbsjob
echo "cd $PWD" >> $pbsjob
echo "./$jobname" >>$pbsjob

chmod +x $pbsjob
qsub -j eo $pbsjob
rm -rf $pbsjob
exit

Change your run.cns script (here we assume ssub is in your path, otherwise give the full path to it):

{===>} queue_1="ssub";
{===>} cns_exe_1="/home/software/software/cns_solve_1.31-UU/intel-x86_64bit-linux/bin/cns";
{===>} cpunumber_1=100;

This will cause HADDOCK to submit 100 jobs simultaneously to the batch system. Once a structure comes back the next job will be submitted. You can adapt the wrapper script to submit to specific queues.

Note that since the rigid-body docking jobs (it0) are usually very fast, it is possible to bundle a number of those into a single job to avoid overloading the batch system. For this edit in your HADDOCK installation the following file: Haddock/Main/QueueSubmit_concat.py and change the value of jobmax["it0"]. In the following example, 5 docking jobs would be concatenated into one job for it0:

jobmax["it0"] = 5
jobmax["it1"] = 1
jobmax["water"]= 1

Small ligand docking with HADDOCK

It's possible to dock small ligands using HADDOCK but for that topology and parameter files for the ligand should be provided in CNS format. Several sources exist to find such files:

the PRODGR server maintained by Daan van Aalten at Dundee University: https://davapc1.bioch.dundee.ac.uk/programs/prodrg/prodrg.html
This server allows you to draw your molecule or paste coordinates and will return topologies and parameter files in various format, including CNS. You should turn on the electrostatic to obtain partial charges. Save the resulting PDB file and the corresponding CNS parameter and topology files to use in HADDOCK.

Important: The generated parameter file contains a CNS NBONds statement which should be removed prior to use in HADDOCK. Look in the parameter file for:
```
     NBONds
       CUTNB=7.0 WMIN=1.5
       REPEL=1.0 REXPONENT=4
       IREXPONENT=1 RCONST=16.0
       TOLERANCE=0.5 NBXMOD=5
       CTONNB=5.5 CTOFNB=6.0
     END
```
and remove or comment it out (by adding ! before each line).
the Automated Topology Builder (ATB) and Repository developed in Prof. Alan Mark's group at the University of Queensland in Brisbane: https://compbio.biosci.uq.edu.au/atb
Notei: we did not yet test those parameters in HADDOCK.
the HIC-Up database maintained by Gerard Kleywegt at Uppsalla University: https://xray.bmc.uu.se/hicup
One problem with those files for use in docking is that typically all partial charges are set to zero, meaning that the electrostatic interaction energies will thus be zero unless you change the partial charges. Still, the HIC-Up topology and parameter files provide a good starting point.

For docking small ligand with HADDOCK using custom-made topology and parameter files and perform the following steps:

Setup your HADDOCK run in the usual way, i.e. generating the new.html file and running haddock first to generate the run directory structure.

Place your custom topology and parameter files in the toppar directory.

When modifying the docking parameters in run.cns, specify the proper topology and parameter files for your ligand. Or alternatively, place the topology and parameter files of your ligand in the toppar/ligand.top and toppar/ligand.param files, respectively.

To avoid that a N- or C-terminal patch be applied to your ligand, add "first IONS" and "last IONS" statements with the name of your ligand in the topallhdg5.3.pep file in the toppar directory (look for the "first IONS" and "last IONS" statements).

Also we recommend to set the number of MD steps for the first two parts (rigid-body high temperature dynamic and slow cooling annealing) of the semi-flexible refinement to 0.

Important1: When starting a run, always check for error messages in the begin directory in the various generate...out files, especially for your ligand.