libplots: plotting functionalities

Plotting functionalities.

haddock.libs.libplots.ClRank

A dict representing clusters’ rank.

key (int): cluster’s id

value(int): cluster’s rank

haddock.libs.libplots.box_plot_data(capri_df: DataFrame, cl_rank: dict[int, int]) DataFrame[source]

Retrieve box plot data.

Parameters:
  • capri_df (pandas DataFrame) – capri table dataframe

  • cl_rank (dict) – {cluster_id : cluster_rank} dictionary

Returns:

gb_full (pandas DataFrame) – DataFrame of all the clusters to be plotted

haddock.libs.libplots.box_plot_handler(capri_filename: str | Path, cl_rank: dict[int, int], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) list[plotly.graph_objs._figure.Figure][source]

Create box plots.

The idea is that for each of the top X-ranked clusters we create a box plot showing how the basic statistics are distributed within each model.

Parameters:
  • capri_filename (str or Path) – capri single structure filename

  • cl_rank (dict) – {cluster_id : cluster_rank} dictionary

  • format (str) – Produce images in the selected format.

  • scale (int) – scale for images.

haddock.libs.libplots.box_plot_plotly(gb_full: DataFrame, y_ax: str, cl_rank: dict[int, int], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) Figure[source]

Create a scatter plot in plotly.

Parameters:
  • gb_full (pandas DataFrame) – data to box plot

  • y_ax (str) – variable to plot

  • cl_rank (dict) – {cluster_id : cluster_rank} dictionary

  • format (str) – Produce images in the selected format.

  • scale (int) – scale of image

Returns:

fig_list (list) – a list of figures

haddock.libs.libplots.clean_capri_table(df: DataFrame) DataFrame[source]

Create a tidy capri table for the report.

It also combines mean and std values in one column. Also it drops the columns that are not needed in the report.

Makes inplace changes to the dataframe.

Parameters:

df (pandas DataFrame) – dataframe of capri values

Returns:

pandas DataFrame – DataFrame of capri table with new column names

haddock.libs.libplots.clt_table_handler(clt_file, ss_file, is_cleaned=False)[source]

Create a dataframe including data for tables.

The idea is to create tidy tables that report statistics available in capri_clt.tsv and capri_ss.tsv files.

Parameters:
  • clt_file (str or Path) – path to capri_clt.tsv file

  • ss_file (str or Path) – path to capri_ss.tsv file

  • is_cleaned (bool) – is the run going to be cleaned?

Returns:

df_merged (pandas DataFrame) – a data frame including data for tables

haddock.libs.libplots.create_html(json_content: str, plot_id: int = 1, figure_height: int = 800, figure_width: int = 1000, offline: bool = False) str[source]

Create html content given a plotly json.

Parameters:
  • json_content (str) – plotly json content

  • plot_id (int) – plot id to be used in the html content

  • figure_height (int) – figure height (in pixels)

  • figure_width (int) – figure width (in pixels)

Returns:

html_content (str) – html content

haddock.libs.libplots.create_other_cluster(clusters_df: DataFrame, structs_df: DataFrame, max_clusters: int) tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Combine all clusters with rank >= max_clusters into an “Other” cluster.

Parameters:
  • clusters_df (pandas DataFrame) – DataFrame of clusters

  • structs_df (pandas DataFrame) – DataFrame of structures

  • max_clusters (int) – From which cluster rank to consider as “Other”

Returns:

tuple with clusters_df and structs_df

haddock.libs.libplots.export_plotly_figure(fig: Figure, output_fname: str | Path, figure_height: int = 1000, figure_width: int = 1000, offline: bool = False) None[source]
haddock.libs.libplots.fig_to_html(fig: Figure, fpath: str | Path, plot_id: int = 1, figure_height: int = 800, figure_width: int = 1000, offline: bool = False) None[source]

Workaround plotly html file generation.

Parameters:
  • json_content (str) – plotly json content

  • plot_id (int) – plot id to be used in the html content

  • figure_height (int) – figure height (in pixels)

  • figure_width (int) – figure width (in pixels)

haddock.libs.libplots.find_best_struct(df: DataFrame, max_best_structs: int = 4) DataFrame[source]

Find best structures for each cluster.

Parameters:
  • df (pd.DataFrame) – The loaded capri_ss.tsv dataframe

  • max_best_structs (int) – The maximum number of best structures to return.

Returns:

best_df (pd.DataFrame) – DataFrame of best structures with cluster_id and best<model-cluster_ranking> columns and empty strings for missing values.

haddock.libs.libplots.heatmap_plotly(matrix: ndarray[Any, dtype[float64]], labels: dict | None = None, xlabels: list | None = None, ylabels: list | None = None, color_scale: str = 'Greys_r', title: str | None = None, output_fname: Path = PosixPath('contacts.html'), offline: bool = False, hovertemplate: str | None = None, customdata: list[list[Any]] | None = None, delineation_traces: list[dict[str, float]] | None = None) Path[source]

Generate a plotly heatmap based on matrix content.

Parameters:
  • matrix (NDFloat) – The 2D matrix containing data to be shown.

  • labels (dict) – Labels of the horizontal (x), vertical (y) and colorscale (color) axis.

  • xlabels (list) – List of columns names.

  • ylabels (list) – List of row names.

  • color_scale (str) – Color scale to use.

  • title (str) – Title of the figure.

  • output_fname (Path) – Path to the output filename to generate.

  • hovertemplate (Optional[str]) – Custrom string used to format data for hover annotation in plotly.

  • customdata (Optional[list[list[list[int]]]]) – A matrix of cluster ids, used for extra hover annotation in plotly.

  • delineation_traces (Optional[list[dict[str, float]]]) – A list of dict enabling to draw lines separating cluster ids.

Returns:

output_fname (Path) – Path to the generated filename

haddock.libs.libplots.in_capri(column: str, df_columns: Index) bool[source]

Check if the selected column is in the set of available columns.

Parameters:
  • column (str) – column name

  • df_columns (pandas.DataFrame.columns) – columns of a pandas.DataFrame

Returns:

resp (bool) – if True, the column is present

haddock.libs.libplots.make_alascan_plot(df: DataFrame, clt_id: int, scan_res: str = 'ALA', offline: bool = False) None[source]

Make a plotly interactive plot.

Score components are here weighted by their respective contribution to the total score.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing the results of the alanine scan.

  • clt_id (int) – Cluster ID.

  • scan_res (str, optional) – Residue name used for the scan, by default “ALA”

haddock.libs.libplots.make_traceback_plot(tr_subset, plot_filename)[source]

Create a traceback barplot with the 40 best ranked models.

Parameters:
  • tr_subset (pandas.DataFrame) – DataFrame containing the top traceback results

  • plot_filename (Path) – Path to the output filename to generate

haddock.libs.libplots.read_capri_table(capri_filename: str | Path, comment: str = '#') DataFrame[source]

Read capri table with pandas.

Parameters:
  • capri_filename (str or Path) – capri single structure filename

  • comment (str) – the string used to denote a commented line in capri tables

Returns:

capri_df (pandas DataFrame) – dataframe of capri values

haddock.libs.libplots.report_generator(boxes, scatters, tables, step)[source]

Create a figure include plots and tables.

The idea is to create a report.html file that includes all the plots and tables generated by the command analyse.

Parameters:
  • boxes (list) – list of box plots generated by box_plot_handler

  • scatters (list) – list of scatter plots generated by scatter_plot_handler

  • table (list) – a list including tables generated by clt_table_handler

haddock.libs.libplots.report_plots_handler(plots, shared_xaxes=False, shared_yaxes=False)[source]

Create a figure that holds subplots.

The idea is that for each type (scatters or boxes), the individual plots are considered subplots. In the report, some of the axes are shared. The settings for sharing axes depends on the type (scatters or boxes).

Parameters:
  • plots (list) – list of plots generated by analyse command

  • shared_xaxes (boolean or str (default False)) – a parameter of plotly.subplots.make_subplots

  • shared_yaxes (boolean or str (default False)) – a parameter of plotly.subplots.make_subplots

Returns:

fig – an instance of plotly.graph_objects.Figure

haddock.libs.libplots.scatter_plot_data(capri_df: DataFrame, cl_rank: dict[int, int]) tuple[pandas.core.groupby.generic.DataFrameGroupBy, pandas.core.frame.DataFrame][source]

Retrieve scatter plot data.

Parameters:
  • capri_df (pandas DataFrame) – capri table dataframe

  • cl_rank (dict) – {cluster_id : cluster_rank} dictionary

Returns:

  • gb_cluster (pandas DataFrameGroupBy) – capri DataFrame grouped by cluster_id

  • gb_other (pandas DataFrame) – DataFrame of clusters not in the top cluster ranking

haddock.libs.libplots.scatter_plot_handler(capri_filename: str | Path, cl_rank: dict[int, int], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) list[plotly.graph_objs._figure.Figure][source]

Create scatter plots.

The idea is that for each pair of variables of interest (SCATTER_PAIRS,

declared as global) we create a scatter plot.

If available, each scatter plot containts cluster information.

Parameters:
  • capri_filename (str or Path) – capri single structure filename

  • cl_rank (dict) – {cluster_id : cluster_rank} dictionary

  • format (str) – Produce images in the selected format.

  • scale (int) – scale for images.

Returns:

fig_list (list) – a list of figures

haddock.libs.libplots.scatter_plot_plotly(gb_cluster: DataFrameGroupBy, gb_other: DataFrame, cl_rank: dict[int, int], x_ax: str, y_ax: str, colors: list[str], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) Figure[source]

Create a scatter plot in plotly.

Parameters:
  • gb_cluster (pandas DataFrameGroupBy) – capri DataFrame grouped by cluster_id

  • gb_other (pandas DataFrame) – DataFrame of clusters not in the top cluster ranking

  • cl_rank (dict) – {cluster_id : cluster_rank} dictionary

  • x_ax (str) – name of the x column

  • y_ax (str) – name of the y column

  • colors (list) – list of colors to be used

  • format (str) – Produce images in the selected format.

  • scale (int) – scale for images.

Returns:

fig – an instance of plotly.graph_objects.Figure

haddock.libs.libplots.update_layout_plotly(fig: Figure, x_label: str, y_label: str, title: str | None = None) Figure[source]

Update layout of plotly plot.

Parameters:
  • fig (plotly Figure) – figure

  • x_label (str) – x axis name

  • y_label (str) – y axis name

  • title (str or None) – plot title