Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Welcome to the haddock-runner docs

image

The haddock-runner is a powerful tool for running large-scale HADDOCK docking experiments. It automates the execution of HADDOCK3 workflows across multiple protein complexes, enabling comprehensive benchmarking and performance evaluation.

HADDOCK (High Ambiguity Driven protein-protein DOCKing) is a widely-used software suite for flexible docking of biomolecular complexes, particularly useful for studying protein-protein interactions.

Key Features

  • Large-scale Benchmarking: Execute HADDOCK workflows on multiple molecular complexes simultaneously
  • Scenario Testing: Run different docking scenarios (workflows, parameters) on the same datasets
  • Concurrent Execution: Process multiple targets concurrently for efficient resource utilization
  • Input Validation: Automatic checksum validation to ensure data integrity
  • Flexible Configuration: YAML-based configuration for complex benchmarking setups

How It Works

haddock-runner takes a YAML configuration file that defines:

  • General settings: Maximum concurrent jobs, core allocation, working directory
  • Input datasets: List of molecular structures and associated files
  • Docking scenarios: Different HADDOCK workflows and parameters to test

The tool then automatically:

  1. Validates all input files using checksums
  2. Creates individual HADDOCK jobs for each target-scenario combination
  3. Executes jobs concurrently according to resource constraints
  4. Organizes results in a structured working directory

Quick Start

Prerequisites

  • HADDOCK3 installed and properly configured
  • Input molecular structures in PDB format
  • Optional restraint files (TBL format) for guided docking

Basic Usage

haddock-runner benchmark_config.yaml

Common Options

Setup mode (validate and prepare without execution):

haddock-runner --setup benchmark_config.yaml

Debug mode (verbose logging):

haddock-runner --debug benchmark_config.yaml

Typical Use Cases

When running benchmarks, researchers typically investigate:

  • Parameter Optimization: How different sampling parameters affect docking quality
  • Workflow Comparison: Performance of different docking protocols
  • Method Validation: Testing new restraint strategies or scoring functions
  • Performance Benchmarking: Execution time and resource usage patterns
  • Reproducibility Studies: Consistent results across different computational environments

Example Workflow

A typical benchmark might include:

  • 5-10 different protein complexes
  • 3-5 different docking scenarios (true interface, center-of-mass, random restraints)
  • 100-1000 docking runs per scenario
  • Concurrent execution on 4-8 CPU cores

Results are organized by scenario and target, making it easy to compare performance across different conditions.

Getting Started with Your Own Benchmark

  1. Prepare your molecular structures in PDB format
  2. Create restraint files if using guided docking
  3. Write a configuration file defining your scenarios
  4. List your input files in the required format
  5. Run the benchmark and analyze results

See the Setting Up a Benchmark and Writing a Benchmark YAML File sections for detailed instructions.

Getting Help

If you encounter any issues or have questions:

The HADDOCK team and community are available to help with setup, configuration, and analysis of your benchmarks.

Installation

The haddock-runner is designed for researchers, developers, and advanced users who are familiar with HADDOCK and command-line computing. It is particularly suited for those with access to HPC infrastructure for running large-scale docking experiments.

Prerequisites

HADDOCK3 Installation

IMPORTANT: haddock-runner requires HADDOCK3 to be installed on your system.

This tool is not a replacement for HADDOCK itself, but rather a benchmarking framework that automates the execution of multiple HADDOCK runs.

If you are new to HADDOCK, we recommend:

  • Completing the basic HADDOCK3 tutorials
  • Familiarizing yourself with HADDOCK3 workflows and configuration

For single target docking or small-scale experiments, consider using:

System Requirements

  • Operating System: Linux (recommended), macOS, or Windows with WSL
  • Memory: Minimum 8GB RAM (16GB+ recommended for concurrent execution)
  • Storage: Sufficient disk space for input structures and results
  • HPC Access: Recommended for large-scale benchmarks

Installation Methods

The easiest way to install haddock-runner is through cargo, Rust’s package manager:

# Install directly from crates.io
cargo install haddock-runner

# This will install the binary to ~/.cargo/bin/haddock-runner

Note: If you don’t have cargo installed, you can install Rust from https://www.rust-lang.org/tools/install

After installation, ensure the cargo bin directory is in your PATH:

# Add cargo bin to your PATH (add this to your ~/.bashrc or ~/.zshrc)
export PATH="$HOME/.cargo/bin:$PATH"

# Verify installation
source $HOME/.cargo/env
haddock-runner --version

Method 2: Install Pre-built Binary from GitHub Releases (Coming Soon)

Pre-compiled binaries will be available for each release on GitHub:

# Download the latest release for your platform
# Check https://github.com/haddocking/haddock-runner/releases for the latest version
VERSION="v3.0.0"  # Update to latest version
OS_ARCH="x86_64-unknown-linux-gnu"  # Choose your platform

wget https://github.com/haddocking/haddock-runner/releases/download/${VERSION}/haddock-runner-${OS_ARCH}

# Make it executable
chmod +x haddock-runner-${OS_ARCH}

# Move to your PATH (optional)
sudo mv haddock-runner-${OS_ARCH} /usr/local/bin/haddock-runner

# Verify installation
haddock-runner --version

Available platforms will include:

  • x86_64-unknown-linux-gnu (Linux 64-bit)
  • x86_64-apple-darwin (macOS Intel)
  • aarch64-apple-darwin (macOS Apple Silicon)

Note: Pre-built binaries are coming soon. For now, please use Method 1 (crates.io) or see the Development section for building from source.

Post-Installation Setup

Add to PATH (Optional)

To make haddock-runner available system-wide:

# Create a symlink or copy the binary to a directory in your PATH
sudo ln -s $(pwd)/target/release/haddock-runner /usr/local/bin/haddock-runner

# Verify it's accessible
which haddock-runner
haddock-runner --version

Verify HADDOCK3 Integration

Before running benchmarks, ensure HADDOCK3 is properly installed and accessible:

# Check HADDOCK3 installation
haddock3 --version

# Verify required modules are available
haddock3 --list-modules

Troubleshooting

Common Issues

Rust installation problems:

  • Ensure you have proper internet connectivity
  • Check that you have required system dependencies (build-essential, curl, etc.)
  • Try rustup update if you already have Rust installed

Missing HADDOCK3:

  • Ensure HADDOCK3 is installed and in your PATH
  • Check that all required HADDOCK modules are available
  • Verify your HADDOCK3 configuration files are properly set up

Permission issues:

  • Ensure you have read/write access to the working directory
  • Check that input files are readable
  • Verify you have execution permissions for the binary

Getting Help

If you encounter installation issues:

Next Steps

Now that you have haddock-runner installed, you’re ready to:

  1. Set up your first benchmark - See Setting Up a Benchmark
  2. Write a configuration file - See Writing a Benchmark YAML File
  3. Prepare your input files - See Writing an Input List File
  4. Run your benchmark - See Running Haddock Runner

Usage Guide

This guide provides a comprehensive, step-by-step introduction to using haddock-runner for running large-scale HADDOCK docking benchmarks. No prior experience with previous versions is assumed.

Quick Start Workflow

Using haddock-runner involves three main steps:

  1. Prepare your input files
  2. Configure your benchmark
  3. Run the benchmark

Complete Usage Guide

Step 1: Prepare Your Molecular Data

Before using haddock-runner, you need:

  • Protein structures: PDB files for your docking targets
  • Restraint files (optional): TBL files for guided docking
  • Topology/parameter files (optional): For ligands or special molecules

Organize your files:

your_project/
├── structures/
│   ├── target1_r_u.pdb    # Receptor structure
│   ├── target1_l_u.pdb    # Ligand structure
│   ├── target1_ti.tbl     # True interface restraints
│   └── target1_ref.pdb    # Reference structure (for evaluation)
└── ...

Step 2: Create the Input List File

The input list file specifies all files needed for each docking target.

Key points:

  • One target per section (separated by comments)
  • List all required files for each target
  • Paths can be relative or absolute
  • Use consistent naming conventions

Example (input_list.txt):

# Target 1A2K - Protein-protein complex
structures/1A2K/1A2K_r_u.pdb
structures/1A2K/1A2K_l_u.pdb
structures/1A2K/1A2K_ti.tbl
structures/1A2K/1A2K_unambig.tbl
structures/1A2K/1A2K_ref.pdb

# Target 1GGR - Another complex
structures/1GGR/1GGR_r_u.pdb
structures/1GGR/1GGR_l_u.pdb
structures/1GGR/1GGR_ti.tbl

Step 3: Write the Benchmark Configuration

The YAML configuration file defines your benchmark scenarios and settings.

Main sections:

  • general: Global settings (concurrency, resources, directories)
  • scenarios: Different docking workflows to test
  • Each scenario defines a complete HADDOCK workflow

Example (benchmark.yaml):

general:
  max_concurrent: 4        # How many jobs to run simultaneously
  ncores: 2               # CPU cores per job
  execution: local        # Execution mode (local, slurm, etc.)
  mol_suffixes: [_r_u, _l_u]  # File name suffixes for molecules
  input_list: input_list.txt  # Path to your input list file
  work_dir: ./results     # Where to store results

scenarios:
  - name: true-interface
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        ambig_fname: _ti.tbl
      flexref:
        ambig_fname: _ti.tbl
      caprieval:
        reference_fname: _ref.pdb

  - name: center-of-mass
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
        cmrest: true

See Configuration Reference for complete configuration options.

Step 4: Run the Benchmark

Execute haddock-runner with your configuration:

# Basic execution
haddock-runner benchmark.yaml

# Setup mode (validate without running)
haddock-runner --setup benchmark.yaml

# Debug mode (verbose logging)
haddock-runner --debug benchmark.yaml

What happens during execution:

  1. Input validation and checksum verification
  2. Job creation for each target-scenario combination
  3. Concurrent execution according to resource limits
  4. Results organization in the working directory
  5. Progress logging and error handling

See Running the Benchmark for runtime details.

Step 5: Analyze Results

After completion, results are organized by scenario and target:

results/
├── true-interface/
│   ├── 1A2K/
│   │   ├── haddock3.cfg
│   │   ├── run1/
│   │   └── ...
│   └── 1GGR/
│       └── ...
└── center-of-mass/
    ├── 1A2K/
    └── 1GGR/
        └── ...

Result analysis tips:

  • Compare docking success rates between scenarios
  • Analyze CAPRI metrics for quality assessment
  • Examine computation times and resource usage
  • Use HADDOCK analysis tools for detailed evaluation

Practical Tips

Starting Small

For your first benchmark:

  • Use 2-3 well-characterized targets
  • Test 2 different scenarios
  • Start with small sampling numbers (100-500)
  • Use --setup mode to validate before full execution

Resource Management

  • Memory: Each job needs ~2-4GB RAM
  • CPU: Allocate cores based on your system capacity
  • Storage: Results can be large (1-10GB per target)
  • Time: Docking runs can take hours to days

Common Workflows

Parameter optimization:

scenarios:
  - name: sampling-500
    workflow:
      rigidbody:
        sampling: 500
  - name: sampling-1000
    workflow:
      rigidbody:
        sampling: 1000
  - name: sampling-2000
    workflow:
      rigidbody:
        sampling: 2000

Restraint strategy comparison:

scenarios:
  - name: true-interface
    workflow:
      rigidbody:
        ambig_fname: _ti.tbl
  - name: hbond-only
    workflow:
      rigidbody:
        ambig_fname: _hb.tbl
  - name: center-of-mass
    workflow:
      rigidbody:
        cmrest: true

Troubleshooting

Common issues and solutions:

Input file errors:

  • Verify all files exist and are readable
  • Check file paths in your input list
  • Use absolute paths if relative paths don’t work

HADDOCK module errors:

  • Ensure HADDOCK3 is properly installed
  • Verify all required modules are available
  • Check your HADDOCK3 configuration

Resource limitations:

  • Reduce max_concurrent if running out of memory
  • Lower sampling numbers for faster testing
  • Use --setup to validate before full runs

Permission issues:

  • Ensure write access to working directory
  • Check execution permissions for the binary
  • Verify HADDOCK3 has proper file access

Best Practices

File Organization

benchmark_project/
├── configs/
│   ├── benchmark.yaml
│   └── input_list.txt
├── structures/
│   ├── target1/
│   ├── target2/
│   └── ...
├── results/
│   └── (auto-generated)
└── analysis/
    └── (your analysis scripts)

Version Control

  • Keep configuration files in Git
  • Store input structures separately (large files)
  • Document changes between benchmark runs
  • Use meaningful commit messages

Reproducibility

  • Fix random seeds when comparing methods
  • Document exact HADDOCK3 version used
  • Record system specifications
  • Archive complete configuration files

Next Steps

Now that you understand the basic workflow:

  1. Set up your first benchmarkSetting Up a Benchmark
  2. Explore example configurationsExamples
  3. Learn about advanced featuresDevelopment
  4. Get help with specific issuesGetting Help

Getting Help

If you encounter any issues:

Examples

This page provides comprehensive examples of haddock-runner configurations for various benchmarking scenarios. These examples demonstrate the tool’s flexibility and help you design your own benchmarks.

Basic Examples

Restraint Strategy Comparison

Compare different restraint approaches for the same targets:

general:
  mol_suffixes: [_r_u, _l_u]
  input_list: input_list.txt
  work_dir: ./results/restraint-comparison
  max_concurrent: 2
  ncores: 4
  execution: local

scenarios:
  - name: true-interface-restraints
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
        ambig_fname: _ti.tbl
      flexref:
        ambig_fname: _ti.tbl
      caprieval:
        reference_fname: _ref.pdb

  - name: hbond-only-restraints
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
        ambig_fname: _hb.tbl
      flexref:
        ambig_fname: _hb.tbl
      caprieval:
        reference_fname: _ref.pdb

  - name: center-of-mass
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
        cmrest: true
      flexref:
        cmrest: true
      caprieval:
        reference_fname: _ref.pdb

  - name: random-air-restraints
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
        ranair: true
      flexref:
        ranair: true
      caprieval:
        reference_fname: _ref.pdb

Advanced Examples

The purpose of this scenario is to sample antibody-peptide complexes, re-docking experimental structures. Rigid docking, flexible refinement and em refinement. Unambiguous restrains to keep Ab heavy and light chain together and ambiguous for CDR loops and whole peptide.

general:
  input_list: /trinity/csbdevel/jvillave/deeprank-ab-pep/haddock3/2_real/2_config/data_2_cutoff_043/input_test_scenario_0.list
  mol_suffixes: [_antibody, _antigen]
  work_dir: /trinity/csbdevel/jvillave/deeprank-ab-pep/haddock3/2_real/3_2_results/scenario_0_results
  execution: slurm 
  max_concurrent: 100
  ncores: 24

scenarios:
  # ------------------------------------------------------------
  # 1) scenario 0, ab initio ground truth
  # ------------------------------------------------------------
  - name: ground-truth
    workflow:
      topoaa:
        tolerance : 20
      rigidbody:
        tolerance : 20
        crossdock: false
        sampling: 10000
        ambig_fname: _ti.tbl
        unambig_fname: _antibody-unambig.tbl
      clustfcc:
        plot_matrix: true
      # select up to 100 clusters per target,
      # keeping 5 top models each (max 500 models)
      seletopclusts:
        top_clusters: 100
        top_models: 5
      flexref:
        tolerance : 20
        ambig_fname: _ti.tbl
        unambig_fname: _antibody-unambig.tbl
      # final energy minimisation
      emref:
        tolerance : 20
        ambig_fname: _ti.tbl
        unambig_fname: _antibody-unambig.tbl
      caprieval:
        reference_fname: _matched.pdb
        fnat_cutoff: 4.0
        irmsd_cutoff: 8.0
      emscoring:
        tolerance : 20
        per_interface_scoring: true

Configuration Variations

HPC Cluster Configuration

Optimized for SLURM workload manager:

general:
  mol_suffixes: [_r_u, _l_u]
  input_list: large_input_list.txt
  work_dir: /scratch/results/large-benchmark
  max_concurrent: 20
  ncores: 8
  execution: slurm

scenarios:
  - name: hpc-optimized
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 2000
        ambig_fname: _ti.tbl
      flexref:
        ambig_fname: _ti.tbl
      caprieval:
        reference_fname: _ref.pdb

Minimal Configuration

Simple setup for quick testing:

general:
  mol_suffixes: [_r_u, _l_u]
  input_list: test_input.txt
  work_dir: ./test-results
  max_concurrent: 2
  ncores: 2
  execution: local

scenarios:
  - name: quick-test
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 100
        cmrest: true

Real-world Example: BM5 Benchmark

A configuration similar to the BM5 benchmark setup:

general:
  mol_suffixes: [_r_u, _l_u]
  input_list: bm5_input_list.txt
  work_dir: ./results/bm5-style
  max_concurrent: 10
  ncores: 4
  execution: local

scenarios:
  - name: bm5-true-interface
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        ambig_fname: _ti.tbl
        unambig_fname: _unambig.tbl
      seletop:
        select: 200
        sort_by: score
      semiflexref:
        ambig_fname: _ti.tbl
        unambig_fname: _unambig.tbl
      emref:
        mdsteps: 500
      caprieval:
        reference_fname: _ref.pdb
        clusters: 4

  - name: bm5-center-mass
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        cmrest: true
      seletop:
        select: 200
        sort_by: score
      semiflexref:
        cmrest: true
      emref:
        mdsteps: 500
      caprieval:
        reference_fname: _ref.pdb
        clusters: 4

Input List Examples

Simple Input List

# Target 1A2K
structures/1A2K/1A2K_r_u.pdb
structures/1A2K/1A2K_l_u.pdb
structures/1A2K/1A2K_ti.tbl
structures/1A2K/1A2K_ref.pdb

# Target 1GGR
structures/1GGR/1GGR_r_u.pdb
structures/1GGR/1GGR_l_u.pdb
structures/1GGR/1GGR_ti.tbl
structures/1GGR/1GGR_ref.pdb

Complex Input List with Multiple File Types

# Target 1PPE - Protein-protein with multiple restraint types
structures/1PPE/1PPE_r_u.pdb
structures/1PPE/1PPE_l_u.pdb
structures/1PPE/1PPE_ti.tbl
structures/1PPE/1PPE_hb.tbl
structures/1PPE/1PPE_unambig.tbl
structures/1PPE/1PPE_ref.pdb

# Target 2OOB - With ligand files
structures/2OOB/2OOB_r_u.pdb
structures/2OOB/2OOB_l_u.pdb
structures/2OOB/2OOB_x_u.pdb
structures/2OOB/2OOB_ti.tbl
structures/2OOB/2OOB_hb.tbl
structures/2OOB/2OOB_ligand.top
structures/2OOB/2OOB_ligand.param
structures/2OOB/2OOB_ref.pdb

Best Practices for Examples

Starting with Examples

  1. Begin with simple configurations and gradually add complexity
  2. Test with small datasets before scaling up
  3. Use --setup mode to validate configurations before full runs
  4. Start with low sampling numbers for initial testing

Adapting Examples

  • Modify scenarios to match your research questions
  • Adjust resource settings based on your hardware
  • Customize workflows for your specific docking needs
  • Scale parameters appropriately for your system size

Creating Your Own

Use these examples as templates and:

  1. Replace file paths with your actual data
  2. Adjust sampling parameters for your needs
  3. Add or remove workflow steps as required
  4. Configure resource limits for your environment

Troubleshooting Examples

Common Configuration Issues

Problem: Jobs fail with missing file errors Solution: Verify all files in input list exist and paths are correct

Problem: Out of memory errors Solution: Reduce max_concurrent or increase ncores per job

Problem: HADDOCK module not found Solution: Ensure HADDOCK3 is properly installed and in PATH

Problem: Slow execution Solution: Adjust max_concurrent and ncores for optimal resource usage

Additional Resources

Setting Up a BM5 Benchmark: Step-by-Step Guide

This guide provides comprehensive, up-to-date instructions for setting up and running a BM5 (Protein-Protein Docking Benchmark v5) benchmark using haddock-runner. The BM5 benchmark (Vreven, 2015) is a widely-used set of 144 non-redundant, high-quality protein-protein complexes for evaluating docking methods.

Prerequisites

Before starting, ensure you have:

  • haddock-runner installed (see Installation)
  • HADDOCK3 properly installed and configured
  • Access to a computing environment with sufficient resources
  • Basic familiarity with command-line tools

Step 1: Set Up Your Project Directory

Create a dedicated directory structure for your benchmark:

# Create project directory
mkdir -p ~/bm5-benchmark && cd ~/bm5-benchmark

# Create subdirectories
mkdir -p {data,configs,results,scripts}

Your project structure will look like:

bm5-benchmark/
├── data/          # BM5 dataset files
├── configs/       # Configuration files
├── results/       # Benchmark results (auto-created)
├── scripts/       # Custom scripts
└── README.md      # Your notes and documentation

Step 2: Download and Prepare BM5 Dataset

The BonvinLab provides a HADDOCK-ready version of BM5:

# Clone the BM5-clean repository
git clone https://github.com/haddocking/BM5-clean.git ~/bm5-benchmark/data/BM5-clean

# Check out a specific version for reproducibility
git checkout v1.1

# Create input list file
find ~/bm5-benchmark/data/BM5-clean/HADDOCK-ready -name "*.pdb" -o -name "*.tbl" \
  | grep -E "(r_u|_l_u|_ti|_unambig|_ref)" \
  | sort > ~/bm5-benchmark/configs/bm5-input.list

Step 3: Create the Benchmark Configuration

Create a modern bm5-benchmark.yaml configuration file:

# File: ~/bm5-benchmark/configs/bm5-benchmark.yaml
general:
  # File patterns and locations
  mol_suffixes: [_r_u, _l_u]          # Standard BM5 naming
  input_list: configs/bm5-input.list   # Path to input list
  work_dir: results/bm5-results       # Where to store results
 
  # Resource management
  max_concurrent: 8                  # Adjust based on your system
  ncores: 4                          # Cores per HADDOCK job
  execution: local                   # Use 'slurm' for HPC clusters

scenarios:
  # Scenario 1: True Interface
  - name: true-interface
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        ambig_fname: _ti.tbl
        unambig_fname: _unambig.tbl
      seletop:
        select: 200
        sort_by: score
      flexref:
        ambig_fname: _ti.tbl
        unambig_fname: _unambig.tbl
      emref:
        mdsteps: 500
      caprieval:
        reference_fname: _ref.pdb
        clusters: 4

  # Scenario 2: Center of Mass
  - name: center-of-mass
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 2000
        cmrest: true
      seletop:
        select: 200
        sort_by: score
      flexref:
        cmrest: true
      emref:
        mdsteps: 500
      caprieval:
        reference_fname: _ref.pdb
        clusters: 4

  # Scenario 3: Random Air Restraints
  - name: random-restraints
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 2000
        ranair: true
      seletop:
        select: 200
        sort_by: score
      flexref:
        ranair: true
      emref:
        mdsteps: 500
      caprieval:
        reference_fname: _ref.pdb

Step 4: Validate Your Setup

Before running the full benchmark, validate your configuration:

# Check that haddock-runner is working
haddock-runner --version

# Validate configuration without execution
haddock-runner --setup configs/bm5-benchmark.yaml

# Check input file count
wc -l configs/bm5-input.list
# Should show ~1000-1500 files for full BM5

Step 5: Run the Benchmark

# Run with progress monitoring
nohup haddock-runner configs/bm5-benchmark.yaml > benchmark.log 2>&1 &

# Monitor progress
tail -f benchmark.log

# Check resource usage
htop  # or your preferred system monitor

HPC Cluster Execution

For SLURM clusters, modify your config:

general:
  execution: slurm
  ncores: 4
  partition: long

Step 6: Monitor and Manage the Benchmark

Monitoring Progress

# Check running jobs
ps aux | grep haddock

# For SLURM
squeue -u $USER

# Check disk usage
du -sh results/

Handling Interruptions

If the benchmark is interrupted:

# Check what completed
find results/ -name "*.done" | wc -l

# Resume from where it left off
haddock-runner configs/bm5-benchmark.yaml

Step 7: Analyze Results

To be added soon.

Troubleshooting

Common Issues and Solutions

Problem: “File not found” errors

  • Solution: Verify all paths in bm5-input.list are correct
  • Check: head configs/bm5-input.list and verify files exist

Problem: HADDOCK3 module errors

  • Solution: Ensure HADDOCK3 is properly installed and in PATH
  • Check: haddock3 --version works from command line

Problem: Out of memory errors

  • Solution: Reduce max_concurrent or increase system memory
  • Check: Monitor memory with free -h or htop

Problem: Slow progress

  • Solution: Adjust max_concurrent and ncores for optimal balance
  • Check: Monitor CPU usage with htop

Best Practices

Reproducibility

# Record exact versions
 echo "haddock-runner $(haddock-runner --version)" > VERSION.txt
 echo "HADDOCK3 $(haddock3 --version)" >> VERSION.txt
 echo "Date: $(date)" >> VERSION.txt

# Save complete configuration
cp configs/bm5-benchmark.yaml results/config-used.yaml

Data Management

# Compress completed results
 tar -czvf bm5-results-$(date +%Y%m%d).tar.gz results/

# Clean up intermediate files (if needed)
 find results/ -name "*.tmp" -delete

Documentation

# BM5 Benchmark Notes

## Setup
- Date: YYYY-MM-DD
- System: Describe your hardware
- HADDOCK3 version: X.X.X
- haddock-runner version: X.X.X

## Configuration
- Scenarios: true-interface, center-of-mass, random-restraints
- Sampling: 1000-2000 per scenario
- Targets: Full BM5 set (144 complexes)

## Results Summary
- Start time: 
- End time: 
- Total runtime: 
- Success rate: 

Complete Example: From Start to Finish

Here’s a complete workflow example:

# 1. Set up project
mkdir -p ~/bm5-benchmark/{data,configs,results,scripts} && cd ~/bm5-benchmark

# 2. Get data
git clone https://github.com/haddocking/BM5-clean.git data/BM5-clean
find data/BM5-clean/HADDOCK-ready -name "*.pdb" -o -name "*.tbl" \
  | grep -E "(r_u|_l_u|_ti|_unambig|_ref)" \
  | sort > configs/bm5-input.list

# 3. Create configuration (use the YAML example above)

# 4. Run full benchmark
haddock-runner configs/bm5-benchmark.yaml

# 5. Analyze results
# to be updated

Additional Resources

Getting Help

If you encounter issues specific to BM5 setup:

This guide provides a complete, up-to-date approach to setting up BM5 benchmarks with the current version of haddock-runner, focusing on clarity, reproducibility, and practical execution.

Configuration Reference

This document provides a comprehensive reference for the haddock-runner configuration YAML file format. It describes all available options, their purpose, valid values, and examples.

Keep in mind that YAML format is indentation-sensitive!

Configuration File Structure

The configuration file is a YAML document with two main sections:

general:
  # Global configuration options

scenarios:
  # Benchmark scenarios to execute

General Configuration

The general section contains global settings that apply to all scenarios and targets.

Options

OptionTypeRequiredDescription
max_concurrentintegerYesMaximum number of jobs to run simultaneously. Controls how many target-scenario combinations execute in parallel.
ncoresintegerYesNumber of CPU cores to allocate per job.
executionstringYesExecution backend. Valid values: local, slurm.
partitionstringNoSLURM partition to submit jobs to when execution: slurm. If omitted, the cluster default partition is used.
mol_suffixesarray of stringsYesFile suffixes used to identify molecule files. Must contain at least 2 suffixes (typically receptor and ligand).
input_liststringYesPath to the input list file containing file paths for all targets.
work_dirstringYesDirectory where benchmark results will be stored. Created automatically if it doesn’t exist.

Example

general:
  max_concurrent: 4
  ncores: 2
  execution: local
  mol_suffixes: [_r_u, _l_u, _x_u]
  input_list: docking/input_list.txt
  work_dir: ./results

Notes

  • Local execution: When using execution: local, the total number of CPU cores required is max_concurrent * ncores. Ensure your system has enough cores.
  • SLURM execution: When using execution: slurm, ensure SLURM is installed and configured. The sbatch and sacct commands must be available in your PATH.
  • File suffixes: The mol_suffixes array defines patterns used to identify molecule files in the input list. Files matching these patterns are grouped together as molecules for each target.

Scenarios Configuration

The scenarios section defines the different docking workflows to test. Each scenario is executed for every target specified in the input list.

Scenario Options

OptionTypeRequiredDescription
namestringYesUnique identifier for this scenario. Used as directory name in the results.
workflowmappingYesHADDOCK3 workflow configuration defining modules and their parameters.

Example

scenarios:
  - name: true-interface
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        ambig_fname: _ti.tbl
      flexref:
        ambig_fname: _ti.tbl
      caprieval:
        reference_fname: _ref.pdb

  - name: center-of-mass
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
        cmrest: true

Input List File Format

The input list file (specified by general.input_list) contains paths to all files required for each docking target. Files are automatically grouped into targets by a shared identifier derived from the filename: for molecule files, the identifier is the part of the filename before the configured mol_suffixes match; for restraints, topology/parameter, shape, and miscellaneous files, grouping typically uses the part before the first underscore.

Important note about ensembles: If your input contains multiple models, haddock will handle it with its own tooling. So here you should define it as a single molecule following the naming scheme, so:

❌ complexA_ens_l_u.pdb # containing 10 models
✅ complexA_l_u.pdb     # containing 10 models

File Classification

Files in the input list are automatically categorized based on their extensions and patterns:

File TypePatternDescription
MoleculesMatches mol_suffixes patternsStructure files (PDB format)
Restraints_*.tblDistance restraint files
Topology/Parameters.top, .paramTopology and parameter files for ligands
Shape_shape* or configured patternShape files for shape-based docking
MiscellaneousAll other filesAny additional files (reference structures, etc.)

Example Input List

# Target 1A2K - Protein-protein complex
structures/1A2K/1A2K_r_u.pdb
structures/1A2K/1A2K_l_u.pdb
structures/1A2K/1A2K_ti.tbl
structures/1A2K/1A2K_unambig.tbl
structures/1A2K/1A2K_ref.pdb

# Target 1GGR - Another complex
structures/1GGR/1GGR_r_u.pdb
structures/1GGR/1GGR_l_u.pdb
structures/1GGR/1GGR_ti.tbl

Notes

  • Lines starting with # are treated as comments and ignored.
  • Empty lines are ignored.
  • Paths can be relative to the configuration file location or absolute.
  • Files are grouped by their root identifier which is extracted by splitting on underscore, taking the first part.

HADDOCK3 Workflow Modules

The workflow section within each scenario defines the HADDOCK3 modules to execute and their parameters. Each module is specified as a YAML key, with its parameters as nested key-value pairs.

IMPORTANT! You should not set any of haddock’s “General Default Parameters” - these are handled by haddock-runner internally!

Haddock Module Patterns

Look in the haddock repository for information about modules/parameters for each module.

Module Parameter Patterns

Many parameters accept filename patterns instead of explicit paths. These patterns are matched against the files available for each target. The pattern matching uses regular expressions.

Common filename patterns:

PatternMatches
_ti.tblFiles ending with _ti.tbl
_unambig.tblFiles ending with _unambig.tbl
_ref.pdbFiles ending with _ref.pdb
_ligand.topFiles ending with _ligand.top
_ligand.paramFiles ending with _ligand.param

Note: When using filename patterns, ensure the corresponding files are listed in the input list and have consistent naming conventions across all targets.


Complete Configuration Examples

Basic Benchmark

general:
  max_concurrent: 2
  ncores: 2
  execution: local
  mol_suffixes: [_r_u, _l_u]
  input_list: input_list.txt
  work_dir: ./results

scenarios:
  - name: standard
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
      flexref:
      emref:

Parameter Optimization with Shape Docking

general:
  max_concurrent: 4
  ncores: 4
  execution: slurm
  partition: long
  mol_suffixes: [_r_u, _l_u, _shape]
  input_list: shape/input.txt
  work_dir: shape-results

scenarios:
  - name: sampling-500
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 500
        mol_shape_3: true

  - name: sampling-1000
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        mol_shape_3: true

  - name: sampling-2000
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 2000
        mol_shape_3: true

Restraint Strategy Comparison

general:
  max_concurrent: 2
  ncores: 2
  execution: local
  mol_suffixes: [_r_u, _l_u]
  input_list: input_list.txt
  work_dir: restraint-comparison

scenarios:
  - name: true-interface
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        ambig_fname: _ti.tbl
        unambig_fname: _unambig.tbl
      flexref:
        ambig_fname: _ti.tbl
      caprieval:
        reference_fname: _ref.pdb

  - name: center-of-mass
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        cmrest: true
      flexref:
      caprieval:
        reference_fname: _ref.pdb

  - name: random-restart
    workflow:
      topoaa:
        autohis: true
      rigidbody:
        sampling: 1000
        ranair: true
      flexref:
      caprieval:
        reference_fname: _ref.pdb

Validation Rules

The configuration file is validated before execution. The following rules apply:

General Section

  1. mol_suffixes: Must be a non-empty array with at least 2 entries.
  2. mol_suffixes: Must contain unique values (no duplicates).
  3. work_dir: Must not be an empty string.
  4. input_list: Must not be an empty string, and the file must exist.
  5. max_concurrent: Must be greater than 0.
  6. ncores: Must be greater than 0.

Local Execution

When execution: local:

  • max_concurrent * ncores must not exceed the available CPU cores on the system.

SLURM Execution

When execution: slurm:

  • The sbatch and sacct commands must be available in the system PATH.

File Resolution

Path Resolution

  • Relative paths in the configuration file (for input_list and work_dir) are resolved relative to the current working directory.

Filename Pattern Resolution

When a module parameter ends with _fname and contains a pattern (e.g., _ti.tbl), the pattern is matched against all files available for the target. The matching is done using regular expressions.

IMPORTANT: If multiple files match the pattern, the match is treated as ambiguous and the resolver returns None, so the parameter is omitted from the generated run TOML.


Directory Structure

After running a benchmark, results are organized as follows:

work_dir/
├── scenario1/
│   ├── target1/
│   │   ├── run1/
│   │   │   └── ... (HADDOCK3 output)
│   │   └── job.sh (only for SLURM execution)
│   └── target2/
│       ├── run1/
│       └── job.sh
└── scenario2/
    ├── target1/
    └── target2/

Tips for Configuration

  1. Start small: Begin with a few targets and simple scenarios to validate your setup.
  2. Use --setup mode: Always run with haddock-runner --setup configuration.yaml first to validate the configuration before full execution.
  3. Check file patterns: Ensure your mol_suffixes patterns correctly match your molecule filenames.
  4. Resource planning: Calculate total CPU requirements as max_concurrent * ncores and ensure your system can handle it.
  5. Consistent naming: Use consistent file naming conventions across all targets for filename patterns to work correctly.

Development Guide

This guide provides comprehensive information for developers contributing to haddock-runner, including setup instructions, architecture overview, coding standards, and CI/CD workflows.

Project Overview

haddock-runner is a Rust application for running large-scale HADDOCK docking benchmarks. It features:

  • Modern Rust stack: Using Cargo, clap, and serde
  • Concurrent execution: Multi-threaded job processing
  • Flexible configuration: YAML-based benchmark definitions
  • Multiple backends: Local, SLURM, and other HPC integrations

Getting Started with Development

Prerequisites

  • Rust toolchain: Latest stable version
  • HADDOCK3: For testing and development
  • Git: For version control
  • Slurm: To develop HPC integration

Development Environment Setup

# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# Clone the repository
git clone https://github.com/haddocking/haddock-runner.git
cd haddock-runner

# Install development tools
rustup component add rustfmt clippy rust-analysis
cargo install cargo-edit cargo-audit

Project Structure

src/
├── main.rs              # Main application entry point
├── input.rs             # Input file parsing and validation
├── dataset.rs           # Dataset loading and management
├── job.rs               # Job creation and management
├── queue.rs             # Job queue and scheduling
├── runner/              # Execution backends
│   ├── mod.rs           # Runner interface
│   ├── local.rs         # Local execution backend
│   ├── slurm.rs         # SLURM backend
│   └── status.rs       # Job status tracking
├── logging.rs           # Logging configuration
├── checksum.rs          # File integrity checking
└── utils.rs             # Utility functions

Cargo.toml              # Rust package configuration
Cargo.lock              # Dependency versions
.example/               # Example configurations
.docs/                  # Documentation
.github/workflows/      # CI/CD workflows

Development Workflow

Building the Project

# Build in development mode
cargo build

# Build with optimizations
cargo build --release

# Build with all features
cargo build --all-features

Running Tests

# Run all tests
cargo test

# Run tests with coverage (requires tarpaulin)
cargo tarpaulin

# Run specific test
cargo test test_specific_function

Code Quality

# Format code
cargo fmt

# Check for linting issues
cargo clippy

# Audit dependencies for vulnerabilities
cargo audit

# Check for outdated dependencies
cargo outdated

Architecture Overview

Core Components

  1. Input System: Parses YAML configuration and validates inputs
  2. Dataset Manager: Loads and organizes molecular data
  3. Job Creator: Generates HADDOCK jobs from scenarios
  4. Queue System: Manages job scheduling and execution
  5. Runner Backends: Local, SLURM, and other execution environments
  6. Monitoring: Tracks job progress and status

Data Flow

Input Files → Configuration Parsing → Dataset Loading → Job Creation → Queue Scheduling → Job Execution → Result Collection