Modules

HADDOCK3 allows users to compose modular simulation workflows. Workflows are composed in steps, and each step is a HADDOCK3 module. There are modules for sampling, refinement, analysis, etc.

Parent code for modules

HADDOCK3 modules.

class haddock.modules.BaseHaddockModule(order: int, path: Path, params_fname: str | Path)[source]

Bases: ABC

HADDOCK3 module’s base class.

add_parent_to_paths() None[source]

Add parent path to paths.

clean_output() None[source]

Clean module output folder.

abstract classmethod confirm_installation() None[source]

Confirm the third-party software needed for the module is installed.

HADDOCK3’s own modules should just return.

export_io_models(faulty_tolerance=0)[source]

Export input/output to the ModuleIO interface.

Modules that do not perform any operation on PDB files should have

input = output.

This function implements a common interface for all modules.

Parameters:

faulty_tolerance (int, default 0) – The percentage of missing output allowed. If 20 is given, raises an error if 20% of the expected output is missing (not saved to disk).

finish_with_error(reason: object = 'Module has failed.') None[source]

Finish with error message.

static last_step_folder(folders, index)[source]

Retrieve last step folder.

log(msg: str, level: str = 'info') None[source]

Log a message with a common header.

Currently the header is the [MODULE NAME] in square brackets.

Parameters:
  • msg (str) – The log message.

  • level (str) – The level log: ‘debug’, ‘info’, … Defaults to ‘info’.

property params: dict[str, Any]

Configuration parameters.

previous_path() Path[source]

Give the path from the previous calculation.

reset_params() None[source]

Reset parameters to the ones used to instantiate the class.

run(**params: Any) None[source]

Execute the module.

save_config(path: str | Path) None[source]

Save current parameters to a HADDOCK3 config file.

update_params(update_from_cfg_file: str | Path | None = None, **params: Any) None[source]

Update the modules parameters.

Add/update to the current modules parameters the ones given in the function call. If you want to enterily replace the modules parameters to their default values use the reset_params() method.

Update takes places recursively, that is, nested dictionaries will be updated accordingly.

To update the current config with the parameters defined in an HADDOCK3 configuration file use the update_from_cfg_file parameter.

To update from a JSON file, first load the JSON into a dictionary and unpack the dictionary to the function call.

Examples

>>> m.update_params(param1=value1, param2=value2)
>>> m.update_params(**param_dict)
>>> m.update_params(update_from_cfg_file=path_to_file)

# if you wish to start from scratch >>> m.reset_params() >>> m.update_params(…)

haddock.modules.get_engine(mode: str, params: dict[Any, Any]) partial[HPCScheduler | Scheduler | MPIScheduler][source]

Create an engine to run the jobs.

Parameters:
  • mode (str) – The type of engine to create

  • params (dict) – A dictionary containing parameters for the engine. get_engine will retrieve from params only those parameters needed and ignore the others.

haddock.modules.get_module_steps_folders(folder: str | Path, modules: Container[int] | None = None) list[str][source]

Return a sorted list of the step folders in a running directory.

Example

Consider the folder structure:

run_dir/

0_topoaa/ 1_rigidbody/ 2_caprieval/ 3_bad_module_name/ data/

>>> get_module_steps_folders("run_dir")
>>> ["0_topoaa", "1_rigidbody", "2_caprieval"]
Parameters:

folder (str or Path) – Path to the run directory, or to the folder containing the step folders.

Returns:

list of str – List containing strings with the names of the step folders.

haddock.modules.is_step_folder(path: str | Path) bool[source]

Assess whether a folder is a possible step folder.

The folder is considered a step folder if has a zero or positive integer index followed by a name of a module.

Parameters:

path (str or pathlib.Path) – The path to the folder.

Returns:

bool – Whether the folder is a step folder or not.

haddock.modules.modules_category = {'alascan': 'analysis', 'caprieval': 'analysis', 'clustfcc': 'analysis', 'clustrmsd': 'analysis', 'contactmap': 'analysis', 'emref': 'refinement', 'emscoring': 'scoring', 'exit': 'extras', 'flexref': 'refinement', 'gdock': 'sampling', 'ilrmsdmatrix': 'analysis', 'lightdock': 'sampling', 'mdref': 'refinement', 'mdscoring': 'scoring', 'rigidbody': 'sampling', 'rmsdmatrix': 'analysis', 'seletop': 'analysis', 'seletopclusts': 'analysis', 'topoaa': 'topology', 'topocg': 'topology'}

Indexes each module in its specific category. Keys are Paths to the module, values are their categories. Categories are the modules parent folders.

haddock.modules.step_folder_regex = '([0-9]+_topocg|[0-9]+_topoaa|[0-9]+_exit|[0-9]+_emscoring|[0-9]+_mdscoring|[0-9]+_caprieval|[0-9]+_rmsdmatrix|[0-9]+_ilrmsdmatrix|[0-9]+_contactmap|[0-9]+_clustfcc|[0-9]+_clustrmsd|[0-9]+_seletopclusts|[0-9]+_seletop|[0-9]+_alascan|[0-9]+_emref|[0-9]+_mdref|[0-9]+_flexref|[0-9]+_lightdock|[0-9]+_rigidbody|[0-9]+_gdock)'

String for regular expression to match module folders in a run directory.

It will match folders with a numeric prefix followed by underscore (“_”) followed by the name of a module.

Example: https://regex101.com/r/roHls9/1

haddock.modules.step_folder_regex_re = re.compile('([0-9]+_topocg|[0-9]+_topoaa|[0-9]+_exit|[0-9]+_emscoring|[0-9]+_mdscoring|[0-9]+_caprieval|[0-9]+_rmsdmatrix|[0-9]+_ilrmsdmatrix|[0-9]+_contactmap|[0-9]+_clustfcc|[0-9]+_clustrmsd|[0-9]+_seletopclus)

Compiled regular expression from step_folder_regex.

It will match folders with a numeric prefix followed by underscore (“_”) followed by the name of a module.

Example: https://regex101.com/r/roHls9/1

Parent code for CNS modules

Functionalities related to CNS modules.

class haddock.modules.base_cns_module.BaseCNSModule(order: int, path: Path, initial_params: str | Path, cns_script: str | Path)[source]

Bases: BaseHaddockModule

Operation module for CNS.

Contains additional functionalities excusive for CNS modules.

add_parent_to_paths() None

Add parent path to paths.

clean_output() None

Clean module output folder.

abstract classmethod confirm_installation() None

Confirm the third-party software needed for the module is installed.

HADDOCK3’s own modules should just return.

default_envvars() dict[str, str][source]

Return default env vars updated to envvars (if given).

export_io_models(faulty_tolerance=0)

Export input/output to the ModuleIO interface.

Modules that do not perform any operation on PDB files should have

input = output.

This function implements a common interface for all modules.

Parameters:

faulty_tolerance (int, default 0) – The percentage of missing output allowed. If 20 is given, raises an error if 20% of the expected output is missing (not saved to disk).

finish_with_error(reason: object = 'Module has failed.') None

Finish with error message.

get_ambig_fnames(prev_ambig_fnames: list[Union[NoneType, str, pathlib.Path]]) list[Union[str, pathlib.Path]] | None[source]

Get the correct ambiguous restraint names.

Parameters:

prev_ambig_fnames (list) – list of ambig_fname files encoded in previous models

Returns:

ambig_fnames (list or None) – list of ambig_fname files to be used by the CNS module

static last_step_folder(folders, index)

Retrieve last step folder.

log(msg: str, level: str = 'info') None

Log a message with a common header.

Currently the header is the [MODULE NAME] in square brackets.

Parameters:
  • msg (str) – The log message.

  • level (str) – The level log: ‘debug’, ‘info’, … Defaults to ‘info’.

make_self_contained() None[source]

Create folders to make run self-contained.

property params: dict[str, Any]

Configuration parameters.

previous_path() Path

Give the path from the previous calculation.

reset_params() None

Reset parameters to the ones used to instantiate the class.

run(**params: Any) None[source]

Execute the module.

save_config(path: str | Path) None

Save current parameters to a HADDOCK3 config file.

save_envvars(filename: str | Path = 'envvars') None[source]

Save envvars needed for CNS to a file in the module’s folder.

update_params(update_from_cfg_file: str | Path | None = None, **params: Any) None

Update the modules parameters.

Add/update to the current modules parameters the ones given in the function call. If you want to enterily replace the modules parameters to their default values use the reset_params() method.

Update takes places recursively, that is, nested dictionaries will be updated accordingly.

To update the current config with the parameters defined in an HADDOCK3 configuration file use the update_from_cfg_file parameter.

To update from a JSON file, first load the JSON into a dictionary and unpack the dictionary to the function call.

Examples

>>> m.update_params(param1=value1, param2=value2)
>>> m.update_params(**param_dict)
>>> m.update_params(update_from_cfg_file=path_to_file)

# if you wish to start from scratch >>> m.reset_params() >>> m.update_params(…)

General Default parameters

General default parameters can be defined in the main section of the configuration file, but can also be defined for each individual module (step). In the later case, overriding the general definition.

Easy

batch_type

default: ‘slurm’
type: string
title: Batch system
choices: [‘slurm’, ‘torque’]
short description: Type of batch system running on your server
long description: Type of batch system running on your server. Only slurm and torque are supported at this time
group: execution
explevel: easy

clean

default: True
type: boolean
title: Clean the module output files.
short description: Clean the module if run succeeds by compressing or removing output files.
long description: When running haddock through the command-line, the ‘clean’ parameter will instruct the workflow to clean the output files of the module if the whole run succeeds. In this process, PDB and PSF files are compressed to gzip, with the extension .gz. While files with extension .seed, .inp, and .out files are archived, and the original files deleted. The time to perform a cleaning operation depends on the number of files in the folders and the size of the files. However, it should not represent a limit step in the workflow. For example, a rigidbody sampling 10,000 structures takes about 4 minutes in our servers. This operation uses as many cores as allowed by the user in the ‘ncores’ parameter. SSD disks will perform faster by definition. See also the ‘haddock3-clean’ and ‘haddock3-unpack’ command-line clients.
group: clean
explevel: easy

cns_exec

default: ‘’
type: file
title: Path to the CNS executable
short description: If not provided, HADDOCK3 will use the cns path configured during the installation.
long description: CNS is a required component to run HADDOCK. Ideally it should have been configured during installation. If not you can specify with the cns_exec parameter its path.
group: execution
explevel: easy

concat

default: 1
type: integer
title: Number of models to produce per job.
min: 1
max: 9999
short description: Multiple models can be calculated within one job
long description: This defines the number of models that will be generated within on job script. This allows to concatenate the generation of models into one script. In that way jobs might run longer in the batch system and reduce the load on the scheduler.
group: execution
explevel: easy

less_io

default: False
type: boolean
title: Reduce the amount of I/O operations.
short description: Reduce the amount of I/O operations.
long description: This option will reduce the amount of I/O operations by writing less files to disk. This can be useful for example when running on a network file system where I/O operations are slow.
group: execution
explevel: easy

mode

default: ‘local’
type: string
title: Mode of execution
choices: [‘local’, ‘batch’]
short description: Mode of execution of the jobs, either local or using a batch system.
long description: Mode of execution of the jobs, either local or using a batch system. Currently slurm and torque are supported. For the batch mode the queue command must be specified in the queue parameter.
group: execution
explevel: easy

ncores

default: 4
type: integer
title: Number of CPU cores
min: 1
max: 500
short description: Number of CPU cores to use for the CNS calculations. It is truncated to max available CPUs minus 1.
long description: Number of CPU cores to use for the CNS calculations. This will define the number of concurrent jobs being executed. Note that is truncated to the total number of available CPUs minus 1.
group: execution
explevel: easy

offline

default: False
type: boolean
title: Isolate haddock3 from internet.
short description: Completely isolate the haddock3 run & results from internet.
long description: For interactive plots, we are using the plotly library. It can be embedded as a link to the plotly.js library and fetched from the web, or directly copied on the html files AT THE COST OF ~3Mb per file. Setting this parameter to true will add the javascript library in generated files, therefore completely isolating haddock3 from any web call.
group: execution
explevel: easy

queue

default: ‘’
type: string
title: Queue name
short description: Name of the batch queue to which jobs will be submitted
long description: Name of the batch queue to which jobs will be submitted. If not defined the batch system default will be used.
group: execution
explevel: easy

queue_limit

default: 100
type: integer
title: Number of jobs to submit to the batch system
min: 1
max: 9999
short description: Number of jobs to submit to the batch system
long description: This parameter controls the number of jobs that will be submitted to the batch system. In combination with the concat parameter this allow to limit the load on the queueing system and also make sure jobs remain in the queue for some time (if concat > 1) to avoid high system loads on the batch system.
group: execution
explevel: easy

Expert

max_cpus

default: True
type: boolean
title: The max number of CPUs allowed.
short description: By default the max number of CPUs allowed is the max available on the system.
long description: If you want to spare a minimum amount of resources for daily tasks, set max_cpus to false; in that case the maximum number of CPUs allowed will be the total available in the machine minus 1. This calculation is done automatically.
group: execution
explevel: expert

Guru

self_contained

default: False
type: boolean
title: Create a self-contained run
short description: This option will copy the CNS scripts and executable to the run folder.
long description: This option will copy the CNS scripts and executable to the run folder to ensure that all scripts are available within the run dir. This can be useful for for example remote execution of a job or for debugging purpose to allow to edit the scripts without touching the main installation.
group: execution
explevel: guru