Welcome to the haddock-runner docs

The haddock-runner is a powerful tool for running large-scale HADDOCK docking experiments. It automates the execution of HADDOCK3 workflows across multiple protein complexes, enabling comprehensive benchmarking and performance evaluation.
HADDOCK (High Ambiguity Driven protein-protein DOCKing) is a widely-used software suite for flexible docking of biomolecular complexes, particularly useful for studying protein-protein interactions.
Key Features
- Large-scale Benchmarking: Execute HADDOCK workflows on multiple molecular complexes simultaneously
- Scenario Testing: Run different docking scenarios (workflows, parameters) on the same datasets
- Concurrent Execution: Process multiple targets concurrently for efficient resource utilization
- Input Validation: Automatic checksum validation to ensure data integrity
- Flexible Configuration: YAML-based configuration for complex benchmarking setups
How It Works
haddock-runner takes a YAML configuration file that defines:
- General settings: Maximum concurrent jobs, core allocation, working directory
- Input datasets: List of molecular structures and associated files
- Docking scenarios: Different HADDOCK workflows and parameters to test
The tool then automatically:
- Validates all input files using checksums
- Creates individual HADDOCK jobs for each target-scenario combination
- Executes jobs concurrently according to resource constraints
- Organizes results in a structured working directory
Quick Start
Prerequisites
- HADDOCK3 installed and properly configured
- Input molecular structures in PDB format
- Optional restraint files (TBL format) for guided docking
Basic Usage
haddock-runner benchmark_config.yaml
Common Options
Setup mode (validate and prepare without execution):
haddock-runner --setup benchmark_config.yaml
Debug mode (verbose logging):
haddock-runner --debug benchmark_config.yaml
Typical Use Cases
When running benchmarks, researchers typically investigate:
- Parameter Optimization: How different sampling parameters affect docking quality
- Workflow Comparison: Performance of different docking protocols
- Method Validation: Testing new restraint strategies or scoring functions
- Performance Benchmarking: Execution time and resource usage patterns
- Reproducibility Studies: Consistent results across different computational environments
Example Workflow
A typical benchmark might include:
- 5-10 different protein complexes
- 3-5 different docking scenarios (true interface, center-of-mass, random restraints)
- 100-1000 docking runs per scenario
- Concurrent execution on 4-8 CPU cores
Results are organized by scenario and target, making it easy to compare performance across different conditions.
Getting Started with Your Own Benchmark
- Prepare your molecular structures in PDB format
- Create restraint files if using guided docking
- Write a configuration file defining your scenarios
- List your input files in the required format
- Run the benchmark and analyze results
See the Setting Up a Benchmark and Writing a Benchmark YAML File sections for detailed instructions.
Getting Help
If you encounter any issues or have questions:
- Open an issue on the GitHub repository
- Contact us at bonvinlab.support@uu.nl
- Join the BioExcel forum and post your question
The HADDOCK team and community are available to help with setup, configuration, and analysis of your benchmarks.
Installation
The haddock-runner is designed for researchers, developers, and advanced users who are familiar with HADDOCK and command-line computing. It is particularly suited for those with access to HPC infrastructure for running large-scale docking experiments.
Prerequisites
HADDOCK3 Installation
IMPORTANT:
haddock-runnerrequires HADDOCK3 to be installed on your system.This tool is not a replacement for HADDOCK itself, but rather a benchmarking framework that automates the execution of multiple HADDOCK runs.
If you are new to HADDOCK, we recommend:
- Completing the basic HADDOCK3 tutorials
- Familiarizing yourself with HADDOCK3 workflows and configuration
For single target docking or small-scale experiments, consider using:
- HADDOCK2.4 web server for interactive use
- HADDOCK3 command-line interface for small batches
System Requirements
- Operating System: Linux (recommended), macOS, or Windows with WSL
- Memory: Minimum 8GB RAM (16GB+ recommended for concurrent execution)
- Storage: Sufficient disk space for input structures and results
- HPC Access: Recommended for large-scale benchmarks
Installation Methods
Method 1: Install via crates.io (Recommended)
The easiest way to install haddock-runner is through cargo, Rust’s package manager:
# Install directly from crates.io
cargo install haddock-runner
# This will install the binary to ~/.cargo/bin/haddock-runner
Note: If you don’t have cargo installed, you can install Rust from https://www.rust-lang.org/tools/install
After installation, ensure the cargo bin directory is in your PATH:
# Add cargo bin to your PATH (add this to your ~/.bashrc or ~/.zshrc)
export PATH="$HOME/.cargo/bin:$PATH"
# Verify installation
source $HOME/.cargo/env
haddock-runner --version
Method 2: Install Pre-built Binary from GitHub Releases (Coming Soon)
Pre-compiled binaries will be available for each release on GitHub:
# Download the latest release for your platform
# Check https://github.com/haddocking/haddock-runner/releases for the latest version
VERSION="v3.0.0" # Update to latest version
OS_ARCH="x86_64-unknown-linux-gnu" # Choose your platform
wget https://github.com/haddocking/haddock-runner/releases/download/${VERSION}/haddock-runner-${OS_ARCH}
# Make it executable
chmod +x haddock-runner-${OS_ARCH}
# Move to your PATH (optional)
sudo mv haddock-runner-${OS_ARCH} /usr/local/bin/haddock-runner
# Verify installation
haddock-runner --version
Available platforms will include:
x86_64-unknown-linux-gnu(Linux 64-bit)x86_64-apple-darwin(macOS Intel)aarch64-apple-darwin(macOS Apple Silicon)
Note: Pre-built binaries are coming soon. For now, please use Method 1 (crates.io) or see the Development section for building from source.
Post-Installation Setup
Add to PATH (Optional)
To make haddock-runner available system-wide:
# Create a symlink or copy the binary to a directory in your PATH
sudo ln -s $(pwd)/target/release/haddock-runner /usr/local/bin/haddock-runner
# Verify it's accessible
which haddock-runner
haddock-runner --version
Verify HADDOCK3 Integration
Before running benchmarks, ensure HADDOCK3 is properly installed and accessible:
# Check HADDOCK3 installation
haddock3 --version
# Verify required modules are available
haddock3 --list-modules
Troubleshooting
Common Issues
Rust installation problems:
- Ensure you have proper internet connectivity
- Check that you have required system dependencies (
build-essential,curl, etc.) - Try
rustup updateif you already have Rust installed
Missing HADDOCK3:
- Ensure HADDOCK3 is installed and in your PATH
- Check that all required HADDOCK modules are available
- Verify your HADDOCK3 configuration files are properly set up
Permission issues:
- Ensure you have read/write access to the working directory
- Check that input files are readable
- Verify you have execution permissions for the binary
Getting Help
If you encounter installation issues:
- Check the GitHub Issues for known problems
- Consult the HADDOCK3 documentation for HADDOCK-specific requirements
Next Steps
Now that you have haddock-runner installed, you’re ready to:
- Set up your first benchmark - See Setting Up a Benchmark
- Write a configuration file - See Writing a Benchmark YAML File
- Prepare your input files - See Writing an Input List File
- Run your benchmark - See Running Haddock Runner
Usage Guide
This guide provides a comprehensive, step-by-step introduction to using haddock-runner for running large-scale HADDOCK docking benchmarks. No prior experience with previous versions is assumed.
Quick Start Workflow
Using haddock-runner involves three main steps:
- Prepare your input files
- Configure your benchmark
- Run the benchmark
Complete Usage Guide
Step 1: Prepare Your Molecular Data
Before using haddock-runner, you need:
- Protein structures: PDB files for your docking targets
- Restraint files (optional): TBL files for guided docking
- Topology/parameter files (optional): For ligands or special molecules
Organize your files:
your_project/
├── structures/
│ ├── target1_r_u.pdb # Receptor structure
│ ├── target1_l_u.pdb # Ligand structure
│ ├── target1_ti.tbl # True interface restraints
│ └── target1_ref.pdb # Reference structure (for evaluation)
└── ...
Step 2: Create the Input List File
The input list file specifies all files needed for each docking target.
Key points:
- One target per section (separated by comments)
- List all required files for each target
- Paths can be relative or absolute
- Use consistent naming conventions
Example (input_list.txt):
# Target 1A2K - Protein-protein complex
structures/1A2K/1A2K_r_u.pdb
structures/1A2K/1A2K_l_u.pdb
structures/1A2K/1A2K_ti.tbl
structures/1A2K/1A2K_unambig.tbl
structures/1A2K/1A2K_ref.pdb
# Target 1GGR - Another complex
structures/1GGR/1GGR_r_u.pdb
structures/1GGR/1GGR_l_u.pdb
structures/1GGR/1GGR_ti.tbl
Step 3: Write the Benchmark Configuration
The YAML configuration file defines your benchmark scenarios and settings.
Main sections:
general: Global settings (concurrency, resources, directories)scenarios: Different docking workflows to test- Each scenario defines a complete HADDOCK workflow
Example (benchmark.yaml):
general:
max_concurrent: 4 # How many jobs to run simultaneously
ncores: 2 # CPU cores per job
execution: local # Execution mode (local, slurm, etc.)
mol_suffixes: [_r_u, _l_u] # File name suffixes for molecules
input_list: input_list.txt # Path to your input list file
work_dir: ./results # Where to store results
scenarios:
- name: true-interface
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
ambig_fname: _ti.tbl
flexref:
ambig_fname: _ti.tbl
caprieval:
reference_fname: _ref.pdb
- name: center-of-mass
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
cmrest: true
See Configuration Reference for complete configuration options.
Step 4: Run the Benchmark
Execute haddock-runner with your configuration:
# Basic execution
haddock-runner benchmark.yaml
# Setup mode (validate without running)
haddock-runner --setup benchmark.yaml
# Debug mode (verbose logging)
haddock-runner --debug benchmark.yaml
What happens during execution:
- Input validation and checksum verification
- Job creation for each target-scenario combination
- Concurrent execution according to resource limits
- Results organization in the working directory
- Progress logging and error handling
See Running the Benchmark for runtime details.
Step 5: Analyze Results
After completion, results are organized by scenario and target:
results/
├── true-interface/
│ ├── 1A2K/
│ │ ├── haddock3.cfg
│ │ ├── run1/
│ │ └── ...
│ └── 1GGR/
│ └── ...
└── center-of-mass/
├── 1A2K/
└── 1GGR/
└── ...
Result analysis tips:
- Compare docking success rates between scenarios
- Analyze CAPRI metrics for quality assessment
- Examine computation times and resource usage
- Use HADDOCK analysis tools for detailed evaluation
Practical Tips
Starting Small
For your first benchmark:
- Use 2-3 well-characterized targets
- Test 2 different scenarios
- Start with small sampling numbers (100-500)
- Use
--setupmode to validate before full execution
Resource Management
- Memory: Each job needs ~2-4GB RAM
- CPU: Allocate cores based on your system capacity
- Storage: Results can be large (1-10GB per target)
- Time: Docking runs can take hours to days
Common Workflows
Parameter optimization:
scenarios:
- name: sampling-500
workflow:
rigidbody:
sampling: 500
- name: sampling-1000
workflow:
rigidbody:
sampling: 1000
- name: sampling-2000
workflow:
rigidbody:
sampling: 2000
Restraint strategy comparison:
scenarios:
- name: true-interface
workflow:
rigidbody:
ambig_fname: _ti.tbl
- name: hbond-only
workflow:
rigidbody:
ambig_fname: _hb.tbl
- name: center-of-mass
workflow:
rigidbody:
cmrest: true
Troubleshooting
Common issues and solutions:
Input file errors:
- Verify all files exist and are readable
- Check file paths in your input list
- Use absolute paths if relative paths don’t work
HADDOCK module errors:
- Ensure HADDOCK3 is properly installed
- Verify all required modules are available
- Check your HADDOCK3 configuration
Resource limitations:
- Reduce
max_concurrentif running out of memory - Lower sampling numbers for faster testing
- Use
--setupto validate before full runs
Permission issues:
- Ensure write access to working directory
- Check execution permissions for the binary
- Verify HADDOCK3 has proper file access
Best Practices
File Organization
benchmark_project/
├── configs/
│ ├── benchmark.yaml
│ └── input_list.txt
├── structures/
│ ├── target1/
│ ├── target2/
│ └── ...
├── results/
│ └── (auto-generated)
└── analysis/
└── (your analysis scripts)
Version Control
- Keep configuration files in Git
- Store input structures separately (large files)
- Document changes between benchmark runs
- Use meaningful commit messages
Reproducibility
- Fix random seeds when comparing methods
- Document exact HADDOCK3 version used
- Record system specifications
- Archive complete configuration files
Next Steps
Now that you understand the basic workflow:
- Set up your first benchmark → Setting Up a Benchmark
- Explore example configurations → Examples
- Learn about advanced features → Development
- Get help with specific issues → Getting Help
Getting Help
If you encounter any issues:
- Check the Troubleshooting section above
- Consult the GitHub Issues
- Review the HADDOCK3 documentation
- Contact the support team via the channels mentioned in the main documentation
Examples
This page provides comprehensive examples of haddock-runner configurations for various benchmarking scenarios. These examples demonstrate the tool’s flexibility and help you design your own benchmarks.
Basic Examples
Restraint Strategy Comparison
Compare different restraint approaches for the same targets:
general:
mol_suffixes: [_r_u, _l_u]
input_list: input_list.txt
work_dir: ./results/restraint-comparison
max_concurrent: 2
ncores: 4
execution: local
scenarios:
- name: true-interface-restraints
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
ambig_fname: _ti.tbl
flexref:
ambig_fname: _ti.tbl
caprieval:
reference_fname: _ref.pdb
- name: hbond-only-restraints
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
ambig_fname: _hb.tbl
flexref:
ambig_fname: _hb.tbl
caprieval:
reference_fname: _ref.pdb
- name: center-of-mass
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
cmrest: true
flexref:
cmrest: true
caprieval:
reference_fname: _ref.pdb
- name: random-air-restraints
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
ranair: true
flexref:
ranair: true
caprieval:
reference_fname: _ref.pdb
Advanced Examples
The purpose of this scenario is to sample antibody-peptide complexes, re-docking experimental structures. Rigid docking, flexible refinement and em refinement. Unambiguous restrains to keep Ab heavy and light chain together and ambiguous for CDR loops and whole peptide.
general:
input_list: /trinity/csbdevel/jvillave/deeprank-ab-pep/haddock3/2_real/2_config/data_2_cutoff_043/input_test_scenario_0.list
mol_suffixes: [_antibody, _antigen]
work_dir: /trinity/csbdevel/jvillave/deeprank-ab-pep/haddock3/2_real/3_2_results/scenario_0_results
execution: slurm
max_concurrent: 100
ncores: 24
scenarios:
# ------------------------------------------------------------
# 1) scenario 0, ab initio ground truth
# ------------------------------------------------------------
- name: ground-truth
workflow:
topoaa:
tolerance : 20
rigidbody:
tolerance : 20
crossdock: false
sampling: 10000
ambig_fname: _ti.tbl
unambig_fname: _antibody-unambig.tbl
clustfcc:
plot_matrix: true
# select up to 100 clusters per target,
# keeping 5 top models each (max 500 models)
seletopclusts:
top_clusters: 100
top_models: 5
flexref:
tolerance : 20
ambig_fname: _ti.tbl
unambig_fname: _antibody-unambig.tbl
# final energy minimisation
emref:
tolerance : 20
ambig_fname: _ti.tbl
unambig_fname: _antibody-unambig.tbl
caprieval:
reference_fname: _matched.pdb
fnat_cutoff: 4.0
irmsd_cutoff: 8.0
emscoring:
tolerance : 20
per_interface_scoring: true
…
Configuration Variations
HPC Cluster Configuration
Optimized for SLURM workload manager:
general:
mol_suffixes: [_r_u, _l_u]
input_list: large_input_list.txt
work_dir: /scratch/results/large-benchmark
max_concurrent: 20
ncores: 8
execution: slurm
scenarios:
- name: hpc-optimized
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 2000
ambig_fname: _ti.tbl
flexref:
ambig_fname: _ti.tbl
caprieval:
reference_fname: _ref.pdb
Minimal Configuration
Simple setup for quick testing:
general:
mol_suffixes: [_r_u, _l_u]
input_list: test_input.txt
work_dir: ./test-results
max_concurrent: 2
ncores: 2
execution: local
scenarios:
- name: quick-test
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 100
cmrest: true
Real-world Example: BM5 Benchmark
A configuration similar to the BM5 benchmark setup:
general:
mol_suffixes: [_r_u, _l_u]
input_list: bm5_input_list.txt
work_dir: ./results/bm5-style
max_concurrent: 10
ncores: 4
execution: local
scenarios:
- name: bm5-true-interface
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
ambig_fname: _ti.tbl
unambig_fname: _unambig.tbl
seletop:
select: 200
sort_by: score
semiflexref:
ambig_fname: _ti.tbl
unambig_fname: _unambig.tbl
emref:
mdsteps: 500
caprieval:
reference_fname: _ref.pdb
clusters: 4
- name: bm5-center-mass
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
cmrest: true
seletop:
select: 200
sort_by: score
semiflexref:
cmrest: true
emref:
mdsteps: 500
caprieval:
reference_fname: _ref.pdb
clusters: 4
Input List Examples
Simple Input List
# Target 1A2K
structures/1A2K/1A2K_r_u.pdb
structures/1A2K/1A2K_l_u.pdb
structures/1A2K/1A2K_ti.tbl
structures/1A2K/1A2K_ref.pdb
# Target 1GGR
structures/1GGR/1GGR_r_u.pdb
structures/1GGR/1GGR_l_u.pdb
structures/1GGR/1GGR_ti.tbl
structures/1GGR/1GGR_ref.pdb
Complex Input List with Multiple File Types
# Target 1PPE - Protein-protein with multiple restraint types
structures/1PPE/1PPE_r_u.pdb
structures/1PPE/1PPE_l_u.pdb
structures/1PPE/1PPE_ti.tbl
structures/1PPE/1PPE_hb.tbl
structures/1PPE/1PPE_unambig.tbl
structures/1PPE/1PPE_ref.pdb
# Target 2OOB - With ligand files
structures/2OOB/2OOB_r_u.pdb
structures/2OOB/2OOB_l_u.pdb
structures/2OOB/2OOB_x_u.pdb
structures/2OOB/2OOB_ti.tbl
structures/2OOB/2OOB_hb.tbl
structures/2OOB/2OOB_ligand.top
structures/2OOB/2OOB_ligand.param
structures/2OOB/2OOB_ref.pdb
Best Practices for Examples
Starting with Examples
- Begin with simple configurations and gradually add complexity
- Test with small datasets before scaling up
- Use
--setupmode to validate configurations before full runs - Start with low sampling numbers for initial testing
Adapting Examples
- Modify scenarios to match your research questions
- Adjust resource settings based on your hardware
- Customize workflows for your specific docking needs
- Scale parameters appropriately for your system size
Creating Your Own
Use these examples as templates and:
- Replace file paths with your actual data
- Adjust sampling parameters for your needs
- Add or remove workflow steps as required
- Configure resource limits for your environment
Troubleshooting Examples
Common Configuration Issues
Problem: Jobs fail with missing file errors Solution: Verify all files in input list exist and paths are correct
Problem: Out of memory errors
Solution: Reduce max_concurrent or increase ncores per job
Problem: HADDOCK module not found Solution: Ensure HADDOCK3 is properly installed and in PATH
Problem: Slow execution
Solution: Adjust max_concurrent and ncores for optimal resource usage
Additional Resources
- Complete configuration reference: Writing a Benchmark YAML File
- Input list format guide: Writing an Input List File
- Running benchmarks: Running Haddock Runner
- Real-world setup: Setting Up a Benchmark
Setting Up a BM5 Benchmark: Step-by-Step Guide
This guide provides comprehensive, up-to-date instructions for setting up and running a BM5 (Protein-Protein Docking Benchmark v5) benchmark using haddock-runner. The BM5 benchmark (Vreven, 2015) is a widely-used set of 144 non-redundant, high-quality protein-protein complexes for evaluating docking methods.
Prerequisites
Before starting, ensure you have:
haddock-runnerinstalled (see Installation)- HADDOCK3 properly installed and configured
- Access to a computing environment with sufficient resources
- Basic familiarity with command-line tools
Step 1: Set Up Your Project Directory
Create a dedicated directory structure for your benchmark:
# Create project directory
mkdir -p ~/bm5-benchmark && cd ~/bm5-benchmark
# Create subdirectories
mkdir -p {data,configs,results,scripts}
Your project structure will look like:
bm5-benchmark/
├── data/ # BM5 dataset files
├── configs/ # Configuration files
├── results/ # Benchmark results (auto-created)
├── scripts/ # Custom scripts
└── README.md # Your notes and documentation
Step 2: Download and Prepare BM5 Dataset
The BonvinLab provides a HADDOCK-ready version of BM5:
# Clone the BM5-clean repository
git clone https://github.com/haddocking/BM5-clean.git ~/bm5-benchmark/data/BM5-clean
# Check out a specific version for reproducibility
git checkout v1.1
# Create input list file
find ~/bm5-benchmark/data/BM5-clean/HADDOCK-ready -name "*.pdb" -o -name "*.tbl" \
| grep -E "(r_u|_l_u|_ti|_unambig|_ref)" \
| sort > ~/bm5-benchmark/configs/bm5-input.list
Step 3: Create the Benchmark Configuration
Create a modern bm5-benchmark.yaml configuration file:
# File: ~/bm5-benchmark/configs/bm5-benchmark.yaml
general:
# File patterns and locations
mol_suffixes: [_r_u, _l_u] # Standard BM5 naming
input_list: configs/bm5-input.list # Path to input list
work_dir: results/bm5-results # Where to store results
# Resource management
max_concurrent: 8 # Adjust based on your system
ncores: 4 # Cores per HADDOCK job
execution: local # Use 'slurm' for HPC clusters
scenarios:
# Scenario 1: True Interface
- name: true-interface
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
ambig_fname: _ti.tbl
unambig_fname: _unambig.tbl
seletop:
select: 200
sort_by: score
flexref:
ambig_fname: _ti.tbl
unambig_fname: _unambig.tbl
emref:
mdsteps: 500
caprieval:
reference_fname: _ref.pdb
clusters: 4
# Scenario 2: Center of Mass
- name: center-of-mass
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 2000
cmrest: true
seletop:
select: 200
sort_by: score
flexref:
cmrest: true
emref:
mdsteps: 500
caprieval:
reference_fname: _ref.pdb
clusters: 4
# Scenario 3: Random Air Restraints
- name: random-restraints
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 2000
ranair: true
seletop:
select: 200
sort_by: score
flexref:
ranair: true
emref:
mdsteps: 500
caprieval:
reference_fname: _ref.pdb
Step 4: Validate Your Setup
Before running the full benchmark, validate your configuration:
# Check that haddock-runner is working
haddock-runner --version
# Validate configuration without execution
haddock-runner --setup configs/bm5-benchmark.yaml
# Check input file count
wc -l configs/bm5-input.list
# Should show ~1000-1500 files for full BM5
Step 5: Run the Benchmark
# Run with progress monitoring
nohup haddock-runner configs/bm5-benchmark.yaml > benchmark.log 2>&1 &
# Monitor progress
tail -f benchmark.log
# Check resource usage
htop # or your preferred system monitor
HPC Cluster Execution
For SLURM clusters, modify your config:
general:
execution: slurm
ncores: 4
partition: long
Step 6: Monitor and Manage the Benchmark
Monitoring Progress
# Check running jobs
ps aux | grep haddock
# For SLURM
squeue -u $USER
# Check disk usage
du -sh results/
Handling Interruptions
If the benchmark is interrupted:
# Check what completed
find results/ -name "*.done" | wc -l
# Resume from where it left off
haddock-runner configs/bm5-benchmark.yaml
Step 7: Analyze Results
To be added soon.
Troubleshooting
Common Issues and Solutions
Problem: “File not found” errors
- Solution: Verify all paths in
bm5-input.listare correct - Check:
head configs/bm5-input.listand verify files exist
Problem: HADDOCK3 module errors
- Solution: Ensure HADDOCK3 is properly installed and in PATH
- Check:
haddock3 --versionworks from command line
Problem: Out of memory errors
- Solution: Reduce
max_concurrentor increase system memory - Check: Monitor memory with
free -horhtop
Problem: Slow progress
- Solution: Adjust
max_concurrentandncoresfor optimal balance - Check: Monitor CPU usage with
htop
Best Practices
Reproducibility
# Record exact versions
echo "haddock-runner $(haddock-runner --version)" > VERSION.txt
echo "HADDOCK3 $(haddock3 --version)" >> VERSION.txt
echo "Date: $(date)" >> VERSION.txt
# Save complete configuration
cp configs/bm5-benchmark.yaml results/config-used.yaml
Data Management
# Compress completed results
tar -czvf bm5-results-$(date +%Y%m%d).tar.gz results/
# Clean up intermediate files (if needed)
find results/ -name "*.tmp" -delete
Documentation
# BM5 Benchmark Notes
## Setup
- Date: YYYY-MM-DD
- System: Describe your hardware
- HADDOCK3 version: X.X.X
- haddock-runner version: X.X.X
## Configuration
- Scenarios: true-interface, center-of-mass, random-restraints
- Sampling: 1000-2000 per scenario
- Targets: Full BM5 set (144 complexes)
## Results Summary
- Start time:
- End time:
- Total runtime:
- Success rate:
Complete Example: From Start to Finish
Here’s a complete workflow example:
# 1. Set up project
mkdir -p ~/bm5-benchmark/{data,configs,results,scripts} && cd ~/bm5-benchmark
# 2. Get data
git clone https://github.com/haddocking/BM5-clean.git data/BM5-clean
find data/BM5-clean/HADDOCK-ready -name "*.pdb" -o -name "*.tbl" \
| grep -E "(r_u|_l_u|_ti|_unambig|_ref)" \
| sort > configs/bm5-input.list
# 3. Create configuration (use the YAML example above)
# 4. Run full benchmark
haddock-runner configs/bm5-benchmark.yaml
# 5. Analyze results
# to be updated
Additional Resources
- BM5 Original Publication: Vreven et al. (2015)
- BM5 Dataset: https://zlab.umassmed.edu/benchmark/
- BM5-clean Repository: https://github.com/haddocking/BM5-clean
- HADDOCK3 Documentation: https://github.com/haddocking/haddock3
Getting Help
If you encounter issues specific to BM5 setup:
- Check the BM5-clean issues
- Consult the HADDOCK forum
- Review the haddock-runner issues
- Contact the BonvinLab support team
This guide provides a complete, up-to-date approach to setting up BM5 benchmarks with the current version of haddock-runner, focusing on clarity, reproducibility, and practical execution.
Configuration Reference
This document provides a comprehensive reference for the haddock-runner configuration YAML file format. It describes all available options, their purpose, valid values, and examples.
Keep in mind that YAML format is indentation-sensitive!
Configuration File Structure
The configuration file is a YAML document with two main sections:
general:
# Global configuration options
scenarios:
# Benchmark scenarios to execute
General Configuration
The general section contains global settings that apply to all scenarios and targets.
Options
| Option | Type | Required | Description |
|---|---|---|---|
max_concurrent | integer | Yes | Maximum number of jobs to run simultaneously. Controls how many target-scenario combinations execute in parallel. |
ncores | integer | Yes | Number of CPU cores to allocate per job. |
execution | string | Yes | Execution backend. Valid values: local, slurm. |
partition | string | No | SLURM partition to submit jobs to when execution: slurm. If omitted, the cluster default partition is used. |
mol_suffixes | array of strings | Yes | File suffixes used to identify molecule files. Must contain at least 2 suffixes (typically receptor and ligand). |
input_list | string | Yes | Path to the input list file containing file paths for all targets. |
work_dir | string | Yes | Directory where benchmark results will be stored. Created automatically if it doesn’t exist. |
Example
general:
max_concurrent: 4
ncores: 2
execution: local
mol_suffixes: [_r_u, _l_u, _x_u]
input_list: docking/input_list.txt
work_dir: ./results
Notes
- Local execution: When using
execution: local, the total number of CPU cores required ismax_concurrent * ncores. Ensure your system has enough cores. - SLURM execution: When using
execution: slurm, ensure SLURM is installed and configured. Thesbatchandsacctcommands must be available in your PATH. - File suffixes: The
mol_suffixesarray defines patterns used to identify molecule files in the input list. Files matching these patterns are grouped together as molecules for each target.
Scenarios Configuration
The scenarios section defines the different docking workflows to test. Each scenario is executed for every target specified in the input list.
Scenario Options
| Option | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier for this scenario. Used as directory name in the results. |
workflow | mapping | Yes | HADDOCK3 workflow configuration defining modules and their parameters. |
Example
scenarios:
- name: true-interface
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
ambig_fname: _ti.tbl
flexref:
ambig_fname: _ti.tbl
caprieval:
reference_fname: _ref.pdb
- name: center-of-mass
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
cmrest: true
Input List File Format
The input list file (specified by general.input_list) contains paths to all files required for each docking target. Files are automatically grouped into targets by a shared identifier derived from the filename: for molecule files, the identifier is the part of the filename before the configured mol_suffixes match; for restraints, topology/parameter, shape, and miscellaneous files, grouping typically uses the part before the first underscore.
Important note about ensembles: If your input contains multiple models, haddock will handle it with its own tooling. So here you should define it as a single molecule following the naming scheme, so:
❌ complexA_ens_l_u.pdb # containing 10 models ✅ complexA_l_u.pdb # containing 10 models
File Classification
Files in the input list are automatically categorized based on their extensions and patterns:
| File Type | Pattern | Description |
|---|---|---|
| Molecules | Matches mol_suffixes patterns | Structure files (PDB format) |
| Restraints | _*.tbl | Distance restraint files |
| Topology/Parameters | .top, .param | Topology and parameter files for ligands |
| Shape | _shape* or configured pattern | Shape files for shape-based docking |
| Miscellaneous | All other files | Any additional files (reference structures, etc.) |
Example Input List
# Target 1A2K - Protein-protein complex
structures/1A2K/1A2K_r_u.pdb
structures/1A2K/1A2K_l_u.pdb
structures/1A2K/1A2K_ti.tbl
structures/1A2K/1A2K_unambig.tbl
structures/1A2K/1A2K_ref.pdb
# Target 1GGR - Another complex
structures/1GGR/1GGR_r_u.pdb
structures/1GGR/1GGR_l_u.pdb
structures/1GGR/1GGR_ti.tbl
Notes
- Lines starting with
#are treated as comments and ignored. - Empty lines are ignored.
- Paths can be relative to the configuration file location or absolute.
- Files are grouped by their root identifier which is extracted by splitting on underscore, taking the first part.
HADDOCK3 Workflow Modules
The workflow section within each scenario defines the HADDOCK3 modules to execute and their parameters. Each module is specified as a YAML key, with its parameters as nested key-value pairs.
IMPORTANT! You should not set any of haddock’s “General Default Parameters” - these are handled by
haddock-runnerinternally!
Haddock Module Patterns
Look in the haddock repository for information about modules/parameters for each module.
Module Parameter Patterns
Many parameters accept filename patterns instead of explicit paths. These patterns are matched against the files available for each target. The pattern matching uses regular expressions.
Common filename patterns:
| Pattern | Matches |
|---|---|
_ti.tbl | Files ending with _ti.tbl |
_unambig.tbl | Files ending with _unambig.tbl |
_ref.pdb | Files ending with _ref.pdb |
_ligand.top | Files ending with _ligand.top |
_ligand.param | Files ending with _ligand.param |
Note: When using filename patterns, ensure the corresponding files are listed in the input list and have consistent naming conventions across all targets.
Complete Configuration Examples
Basic Benchmark
general:
max_concurrent: 2
ncores: 2
execution: local
mol_suffixes: [_r_u, _l_u]
input_list: input_list.txt
work_dir: ./results
scenarios:
- name: standard
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
flexref:
emref:
Parameter Optimization with Shape Docking
general:
max_concurrent: 4
ncores: 4
execution: slurm
partition: long
mol_suffixes: [_r_u, _l_u, _shape]
input_list: shape/input.txt
work_dir: shape-results
scenarios:
- name: sampling-500
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 500
mol_shape_3: true
- name: sampling-1000
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
mol_shape_3: true
- name: sampling-2000
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 2000
mol_shape_3: true
Restraint Strategy Comparison
general:
max_concurrent: 2
ncores: 2
execution: local
mol_suffixes: [_r_u, _l_u]
input_list: input_list.txt
work_dir: restraint-comparison
scenarios:
- name: true-interface
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
ambig_fname: _ti.tbl
unambig_fname: _unambig.tbl
flexref:
ambig_fname: _ti.tbl
caprieval:
reference_fname: _ref.pdb
- name: center-of-mass
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
cmrest: true
flexref:
caprieval:
reference_fname: _ref.pdb
- name: random-restart
workflow:
topoaa:
autohis: true
rigidbody:
sampling: 1000
ranair: true
flexref:
caprieval:
reference_fname: _ref.pdb
Validation Rules
The configuration file is validated before execution. The following rules apply:
General Section
mol_suffixes: Must be a non-empty array with at least 2 entries.mol_suffixes: Must contain unique values (no duplicates).work_dir: Must not be an empty string.input_list: Must not be an empty string, and the file must exist.max_concurrent: Must be greater than 0.ncores: Must be greater than 0.
Local Execution
When execution: local:
max_concurrent * ncoresmust not exceed the available CPU cores on the system.
SLURM Execution
When execution: slurm:
- The
sbatchandsacctcommands must be available in the system PATH.
File Resolution
Path Resolution
- Relative paths in the configuration file (for
input_listandwork_dir) are resolved relative to the current working directory.
Filename Pattern Resolution
When a module parameter ends with _fname and contains a pattern (e.g., _ti.tbl), the pattern is matched against all files available for the target. The matching is done using regular expressions.
IMPORTANT: If multiple files match the pattern, the match is treated as ambiguous and the resolver returns None, so the parameter is omitted from the generated run TOML.
Directory Structure
After running a benchmark, results are organized as follows:
work_dir/
├── scenario1/
│ ├── target1/
│ │ ├── run1/
│ │ │ └── ... (HADDOCK3 output)
│ │ └── job.sh (only for SLURM execution)
│ └── target2/
│ ├── run1/
│ └── job.sh
└── scenario2/
├── target1/
└── target2/
Tips for Configuration
- Start small: Begin with a few targets and simple scenarios to validate your setup.
- Use
--setupmode: Always run withhaddock-runner --setup configuration.yamlfirst to validate the configuration before full execution. - Check file patterns: Ensure your
mol_suffixespatterns correctly match your molecule filenames. - Resource planning: Calculate total CPU requirements as
max_concurrent * ncoresand ensure your system can handle it. - Consistent naming: Use consistent file naming conventions across all targets for filename patterns to work correctly.
Development Guide
This guide provides comprehensive information for developers contributing to haddock-runner, including setup instructions, architecture overview, coding standards, and CI/CD workflows.
Project Overview
haddock-runner is a Rust application for running large-scale HADDOCK docking benchmarks. It features:
- Modern Rust stack: Using Cargo, clap, and serde
- Concurrent execution: Multi-threaded job processing
- Flexible configuration: YAML-based benchmark definitions
- Multiple backends: Local, SLURM, and other HPC integrations
Getting Started with Development
Prerequisites
- Rust toolchain: Latest stable version
- HADDOCK3: For testing and development
- Git: For version control
- Slurm: To develop HPC integration
Development Environment Setup
# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
# Clone the repository
git clone https://github.com/haddocking/haddock-runner.git
cd haddock-runner
# Install development tools
rustup component add rustfmt clippy rust-analysis
cargo install cargo-edit cargo-audit
Project Structure
src/
├── main.rs # Main application entry point
├── input.rs # Input file parsing and validation
├── dataset.rs # Dataset loading and management
├── job.rs # Job creation and management
├── queue.rs # Job queue and scheduling
├── runner/ # Execution backends
│ ├── mod.rs # Runner interface
│ ├── local.rs # Local execution backend
│ ├── slurm.rs # SLURM backend
│ └── status.rs # Job status tracking
├── logging.rs # Logging configuration
├── checksum.rs # File integrity checking
└── utils.rs # Utility functions
Cargo.toml # Rust package configuration
Cargo.lock # Dependency versions
.example/ # Example configurations
.docs/ # Documentation
.github/workflows/ # CI/CD workflows
Development Workflow
Building the Project
# Build in development mode
cargo build
# Build with optimizations
cargo build --release
# Build with all features
cargo build --all-features
Running Tests
# Run all tests
cargo test
# Run tests with coverage (requires tarpaulin)
cargo tarpaulin
# Run specific test
cargo test test_specific_function
Code Quality
# Format code
cargo fmt
# Check for linting issues
cargo clippy
# Audit dependencies for vulnerabilities
cargo audit
# Check for outdated dependencies
cargo outdated
Architecture Overview
Core Components
- Input System: Parses YAML configuration and validates inputs
- Dataset Manager: Loads and organizes molecular data
- Job Creator: Generates HADDOCK jobs from scenarios
- Queue System: Manages job scheduling and execution
- Runner Backends: Local, SLURM, and other execution environments
- Monitoring: Tracks job progress and status
Data Flow
Input Files → Configuration Parsing → Dataset Loading → Job Creation → Queue Scheduling → Job Execution → Result Collection