libclust: functions related to clustering

Library of functions related to the clustering modules.

Main functions

haddock.libs.libclust.add_cluster_info(sorted_score_dic, clt_dic)[source]

Add cluster information to the models.

Parameters:
  • sorted_score_dic (list) – List of tuples with the cluster ID and the average score, sorted by the average score.

  • clt_dic (dict) – Dictionary with the clusters.

Returns:

output_models (list) – List of models with the cluster information.

haddock.libs.libclust.clustrmsd_tolerance_params(parameters: ParamDictT) tuple[str, int | float][source]

Provide parameters of interest for clust rmsd.

Parameters:

parameters (ParamDictT) – The clustrmsd module parameters

Returns:

tuple[str, Union[int, float]] – Name of the tolerance parameter and its value.

haddock.libs.libclust.get_cluster_matrix_plot_clt_dt(cluster_ids: list[int]) tuple[list[list[list[int]]], list[dict[str, float]]][source]

Generate cluster matrix data for plotly.

Parameters:

cluster_ids (list[int]) – List containing ordered cluster ids.

Returns:

  • matrix_cluster_dt (list[list[list[int]]]) – A matrix of cluster ids, used for plotly.

  • cluster_limits (list[dict[str, float]]]) – Boundaries to draw lines between clusters with plotly.

haddock.libs.libclust.plot_cluster_matrix(matrix_path: Path | str, final_order_idx: list[int], labels: list[str], dttype: str = '', diag_fill: int | float = 1, color_scale: str = 'Blues', reverse: bool = False, output_fname: str | Path = 'clust_matrix', matrix_cluster_dt: list[list[list[int]]] | None = None, cluster_limits: list[dict[str, float]] | None = None) str | None[source]

Plot a plotly heatmap of a matrix file.

Parameters:
  • matrix_path (Union[Path, FilePath, str]) – Path to a half-matrix

  • final_order_idx (list[int]) – Index orders

  • labels (list[str]) – Ordered labels

  • dttype (str) – Name of the data type, by default ``

  • color_scale (str, optional) – Color scale for the plot, by default “Blues”

  • reversed (bool, optional) – Should the color scale be reversed ?, by default False

  • output_fname (Union[str, Path, FilePath], optional) – Name of the output file to generate, by default ‘clust_matrix.html’

  • matrix_cluster_dt (Optional[list[list[list[int]]]]) – A matrix of cluster ids, used for extra hover annotation in plotly.

  • cluster_limits (Optional[list[dict[str, float]]]) – A list of dict enabling to draw lines separating cluster ids.

Returns:

output_fname_ext (str) – Path to the generated file containing the figure.

haddock.libs.libclust.rank_clusters(clt_dic, threshold)[source]

Rank the clusters by their average score.

Parameters:
  • clt_dic (dict) – Dictionary with the clusters.

  • threshold (int) – Number of models to consider for the average score.

Returns:

  • score_dic (dict) – Dictionary with the cluster ID as key and the average score as value.

  • sorted_score_dic (list) – List of tuples with the cluster ID and the average score, sorted by the average score.

haddock.libs.libclust.write_structure_list(input_models: list[PDBFile], clustered_models: list[PDBFile], out_fname: str | Path) None[source]

Get the list of unclustered structures.

Parameters:
  • input_models (list) – list of input models

  • clustered_models (list) – list of clustered models