RMSD Clustering module

Submodules

Module contents

RMSD clustering module.

This module takes in input the RMSD matrix calculated in the previous step and performs a hierarchical clustering procedure on it, leveraging scipy routines for this purpose.

Essentially, the procedure amounts at lumping the input models in a progressively coarser hierarchy of clusters, called the dendrogram.

Four parameters can be defined in this context:

  • linkage: governs the way clusters are merged together in the creation of the dendrogram

  • criterion: defines the prescription to cut the dendrogram and obtain the desired clusters

  • n_clusters: number of desired clusters (if criterion is maxclust).

  • clust_cutoff: value of distance that separates distinct clusters (if criterion is distance)

  • min_population : analogously to the clustfcc module, it is the minimum number of models that should be present in a cluster to consider it. If criterion is maxclust, the value is ignored.

This module passes the path to the RMSD matrix is to the next step of the workflow through the rmsd_matrix.json file, thus allowing to execute several clustrmsd modules (possibly with different parameters) on the same RMSD matrix.

class haddock.modules.analysis.clustrmsd.HaddockModule(order: int, path: Path, initial_params: Path | str = PosixPath('/home/runner/work/haddock3/haddock3/src/haddock/modules/analysis/clustrmsd/defaults.yaml'))[source]

Bases: BaseHaddockModule

HADDOCK3 module for clustering with RMSD.

classmethod confirm_installation() None[source]

Confirm if contact executable is compiled.

name: str = 'clustrmsd'

Default Parameters

Easy

clust_cutoff

default: 7.5
type: float
title: Clustering cutoff distance
min: 1
max: 9999
short description: Value of cutoff cophenetic distance.
long description: Value of cutoff cophenetic distance. When criterion is maxclust, this value is ignored.
group: analysis
explevel: easy

min_population

default: 4
type: integer
title: Clustering population threshold
min: 1
max: 9999
short description: Threshold employed to exclude clusters with less than this number of members. By default 4.
long description: Threshold employed to exclude clusters with less than this number of members. By default 4. When criterion is maxclust, this value is ignored.
group: analysis
explevel: easy

n_clusters

default: 4
type: integer
title: Number of clusters
min: 1
max: 9999
short description: Number of clusters to be formed
long description: Number of clusters to be formed. When criterion is distance, this value is ignored.
group: analysis
explevel: easy

plot_matrix

default: False
type: boolean
title: Plot matrix of members
short description: Plot matrix of members. By default is false.
long description: Plot matrix of members. By default is false.
group: analysis
explevel: easy

Expert

criterion

default: ‘distance’
type: string
title: Criterion for fcluster
choices: [‘distance’, ‘maxclust’]
short description: Criterion to be used to cut the dendrogram
long description: if criterion is maxclust, the dendrogram is cut when a certain number of clusters is formed. If criterion is distance, the dendrogram will be cut based on the value of the cophenetic distance
group: analysis
explevel: expert

linkage

default: ‘average’
type: string
title: Linkage type
choices: [‘average’, ‘centroid’, ‘complete’, ‘median’, ‘single’, ‘ward’, ‘weighted’]
short description: How to lump together clusters in hierarchical clustering
group: analysis
explevel: expert