libplots: plotting functionalities

Plotting functionalities.

haddock.libs.libplots.ClRank

A dict representing clusters’ rank.

key (int): cluster’s id

value(int): cluster’s rank

haddock.libs.libplots.box_plot_data(capri_df: DataFrame, cl_rank: dict[int, int]) → DataFrame[source]

Retrieve box plot data.

Parameters:

capri_df (pandas DataFrame) – capri table dataframe
cl_rank (dict) – {cluster_id : cluster_rank} dictionary

Returns:

gb_full (pandas DataFrame) – DataFrame of all the clusters to be plotted

haddock.libs.libplots.box_plot_handler(capri_filename: str | Path, cl_rank: dict[int, int], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) → list[Figure][source]

Create box plots.

The idea is that for each of the top X-ranked clusters we create a box plot showing how the basic statistics are distributed within each model.

Parameters:

capri_filename (str or Path) – capri single structure filename
cl_rank (dict) – {cluster_id : cluster_rank} dictionary
format (str) – Produce images in the selected format.
scale (int) – scale for images.

haddock.libs.libplots.box_plot_plotly(gb_full: DataFrame, y_ax: str, cl_rank: dict[int, int], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) → Figure[source]

Create a scatter plot in plotly.

Parameters:

gb_full (pandas DataFrame) – data to box plot
y_ax (str) – variable to plot
cl_rank (dict) – {cluster_id : cluster_rank} dictionary
format (str) – Produce images in the selected format.
scale (int) – scale of image

Returns:

fig_list (list) – a list of figures

haddock.libs.libplots.clean_capri_table(df: DataFrame) → DataFrame[source]

Create a tidy capri table for the report.

It also combines mean and std values in one column. Also it drops the columns that are not needed in the report.

Makes inplace changes to the dataframe.

Parameters:: df (pandas DataFrame) – dataframe of capri values
Returns:: pandas DataFrame – DataFrame of capri table with new column names

haddock.libs.libplots.clt_table_handler(clt_file: str | Path, ss_file: str | Path, is_cleaned: bool = False, topX_clusters: int = 10, clustered_topX: int = 4, unclustered_topX: int = 10, top_ranked_mapping: dict[Path, Path] | None = None) → DataFrame[source]

Create a dataframe including data for tables.

The idea is to create tidy tables that report statistics available in capri_clt.tsv and capri_ss.tsv files.

Parameters:

clt_file (str or Path) – path to capri_clt.tsv file
ss_file (str or Path) – path to capri_ss.tsv file
is_cleaned (bool) – is the run going to be cleaned?

Returns:

df_merged (pandas DataFrame) – a data frame including data for tables

haddock.libs.libplots.create_html(json_content: str, plot_id: int = 1, plotly_js_import: str | None = None, figure_height: int = 800, figure_width: int = 1000) → str[source]

Create html content given a plotly json.

Parameters:

json_content (str) – plotly json content
plot_id (int) – plot id to be used in the html content
figure_height (int) – figure height (in pixels)
figure_width (int) – figure width (in pixels)

Returns:

html_content (str) – html content

haddock.libs.libplots.create_other_cluster(clusters_df: DataFrame, structs_df: DataFrame, max_clusters: int) → tuple[DataFrame, DataFrame][source]

Combine all clusters with rank >= max_clusters into an “Other” cluster.

Parameters:

clusters_df (pandas DataFrame) – DataFrame of clusters
structs_df (pandas DataFrame) – DataFrame of structures
max_clusters (int) – From which cluster rank to consider as “Other”

Returns:

tuple with clusters_df and structs_df

haddock.libs.libplots.export_plotly_figure(fig: Figure, output_fname: str | Path, figure_height: int = 1000, figure_width: int = 1000, offline: bool = False) → None[source]

Write a plotly figure.

Parameters:

fig (Figure) – The plotly Figure object
output_fname (Union[str, Path]) – Where to write it
figure_height (int, optional) – Height of the figure (in pixels), by default 1000
figure_width (int, optional) – Width of the figure (in pixels), by default 1000
offline (bool, optional) – If True add the plotly js library to the file, by default False

haddock.libs.libplots.fig_to_html(fig: Figure, fpath: str | Path, plot_id: int = 1, figure_height: int = 800, figure_width: int = 1000, offline: bool = False) → None[source]

Workaround plotly html file generation.

Parameters:

fig (Figure) – A Figure object created by Plotly
fpath (Union[str, Path]) – Where to write the content
json_content (str) – plotly json content
plot_id (int) – plot id to be used in the html content
figure_height (int) – figure height (in pixels)
figure_width (int) – figure width (in pixels)
offline (bool) – If set to False, use the cdn url to obtain the javascript content for the rendering.

haddock.libs.libplots.find_best_struct(df: DataFrame, max_best_structs: int = 4) → DataFrame[source]

Find best structures for each cluster.

Parameters:

df (pd.DataFrame) – The loaded capri_ss.tsv dataframe
max_best_structs (int) – The maximum number of best structures to return.

Returns:

best_df (pd.DataFrame) – DataFrame of best structures with cluster_id and best<model-cluster_ranking> columns and empty strings for missing values.

haddock.libs.libplots.heatmap_plotly(matrix: ndarray[tuple[int, ...], dtype[float64]], labels: dict | None = None, xlabels: list | None = None, ylabels: list | None = None, color_scale: str = 'Greys_r', title: str | None = None, output_fname: Path = PosixPath('contacts.html'), offline: bool = False, hovertemplate: str | None = None, customdata: list[list[Any]] | None = None, delineation_traces: list[dict[str, float]] | None = None) → Path[source]

Generate a plotly heatmap based on matrix content.

Parameters:

matrix (NDFloat) – The 2D matrix containing data to be shown.
labels (dict) – Labels of the horizontal (x), vertical (y) and colorscale (color) axis.
xlabels (list) – List of columns names.
ylabels (list) – List of row names.
color_scale (str) – Color scale to use.
title (str) – Title of the figure.
output_fname (Path) – Path to the output filename to generate.
hovertemplate (Optional[str]) – Custrom string used to format data for hover annotation in plotly.
customdata (Optional[list[list[list[int]]]]) – A matrix of cluster ids, used for extra hover annotation in plotly.
delineation_traces (Optional[list[dict[str, float]]]) – A list of dict enabling to draw lines separating cluster ids.

Returns:

output_fname (Path) – Path to the generated filename

haddock.libs.libplots.in_capri(column: str, df_columns: Index) → bool[source]

Check if the selected column is in the set of available columns.

Parameters:

column (str) – column name
df_columns (pandas.DataFrame.columns) – columns of a pandas.DataFrame

Returns:

resp (bool) – if True, the column is present

haddock.libs.libplots.make_alascan_plot(df: DataFrame, clt_id: int, scan_res: str = 'ALA', offline: bool = False) → str[source]

Make a plotly interactive plot.

Score components are here weighted by their respective contribution to the total score.

Parameters:

df (pandas.DataFrame) – DataFrame containing the results of the alanine scan.
clt_id (int) – Cluster ID.
scan_res (str, optional) – Residue name used for the scan, by default “ALA”

Returns:

html_output_filename (str) – Name of the plot generated

haddock.libs.libplots.make_traceback_plot(tr_subset, plot_filename, offline=False)[source]

Create a traceback barplot with the 40 best ranked models.

Parameters:

tr_subset (pandas.DataFrame) – DataFrame containing the top traceback results
plot_filename (Path) – Path to the output filename to generate

haddock.libs.libplots.offline_js_manager(fpath: str | Path, offline: bool) → str[source]

Build string to access plotly javascript content.

Parameters:

fpath (FilePath) – Path to the figure about to be written.
offline (bool) – if True use the offline approach.

Returns:

plotly_js_import (str) – HTML solution for the importation of the plotly javascript content.

haddock.libs.libplots.read_capri_table(capri_filename: str | Path, comment: str = '#') → DataFrame[source]

Read capri table with pandas.

Parameters:

capri_filename (str or Path) – capri single structure filename
comment (str) – the string used to denote a commented line in capri tables

Returns:

capri_df (pandas DataFrame) – dataframe of capri values

haddock.libs.libplots.report_generator(boxes: list[Figure], scatters: list[Figure], tables: list, step: str, directory: str | Path = '.', offline: bool = False) → None[source]

Create a figure include plots and tables.

The idea is to create a report.html file that includes all the plots and tables generated by the command analyse.

Parameters:

boxes (list) – list of box plots generated by box_plot_handler
scatters (list) – list of scatter plots generated by scatter_plot_handler
table (list) – a list including tables generated by clt_table_handler
directory (Path) – path to the output folder
offline (bool) – If True, the HTML will be generated for offline use.

haddock.libs.libplots.report_plots_handler(plots, shared_xaxes=False, shared_yaxes=False)[source]

Create a figure that holds subplots.

The idea is that for each type (scatters or boxes), the individual plots are considered subplots. In the report, some of the axes are shared. The settings for sharing axes depends on the type (scatters or boxes).

Parameters:

plots (list) – list of plots generated by analyse command
shared_xaxes (boolean or str (default False)) – a parameter of plotly.subplots.make_subplots
shared_yaxes (boolean or str (default False)) – a parameter of plotly.subplots.make_subplots

Returns:

fig – an instance of plotly.graph_objects.Figure

haddock.libs.libplots.scatter_plot_data(capri_df: DataFrame, cl_rank: dict[int, int]) → tuple[DataFrameGroupBy, DataFrame][source]

Retrieve scatter plot data.

Parameters:

capri_df (pandas DataFrame) – capri table dataframe
cl_rank (dict) – {cluster_id : cluster_rank} dictionary

Returns:

gb_cluster (pandas DataFrameGroupBy) – capri DataFrame grouped by cluster_id
gb_other (pandas DataFrame) – DataFrame of clusters not in the top cluster ranking

haddock.libs.libplots.scatter_plot_handler(capri_filename: str | Path, cl_rank: dict[int, int], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) → list[Figure][source]

Create scatter plots.

The idea is that for each pair of variables of interest (SCATTER_PAIRS,: declared as global) we create a scatter plot.

If available, each scatter plot containts cluster information.

Parameters:

capri_filename (str or Path) – capri single structure filename
cl_rank (dict) – {cluster_id : cluster_rank} dictionary
format (str) – Produce images in the selected format.
scale (int) – scale for images.

Returns:

fig_list (list) – a list of figures

haddock.libs.libplots.scatter_plot_plotly(gb_cluster: DataFrameGroupBy, gb_other: DataFrame, cl_rank: dict[int, int], x_ax: str, y_ax: str, colors: list[str], format: Literal['png', 'pdf', 'svg', 'jpeg', 'webp'] | None, scale: float | None, offline: bool = False) → Figure[source]

Create a scatter plot in plotly.

Parameters:

gb_cluster (pandas DataFrameGroupBy) – capri DataFrame grouped by cluster_id
gb_other (pandas DataFrame) – DataFrame of clusters not in the top cluster ranking
cl_rank (dict) – {cluster_id : cluster_rank} dictionary
x_ax (str) – name of the x column
y_ax (str) – name of the y column
colors (list) – list of colors to be used
format (str) – Produce images in the selected format.
scale (int) – scale for images.

Returns:

fig – an instance of plotly.graph_objects.Figure

haddock.libs.libplots.update_layout_plotly(fig: Figure, x_label: str, y_label: str, title: str | None = None) → Figure[source]

Update layout of plotly plot.

Parameters:

fig (plotly Figure) – figure
x_label (str) – x axis name
y_label (str) – y axis name
title (str or None) – plot title