libalign: sequence and structural alignments

Library of functions to perform sequence and structural alignments.

Main functions

calc_rmsd()
centroid()
kabsch()
load_coords()
pdb2fastadic()
get_atoms()
get_align()
align_struct()
align_seq()
make_range()
dump_as_izone()

exception haddock.libs.libalign.ALIGNError(msg: object = '')[source]

Bases: Exception

Raised when something goes wrong with the ALIGNMENT library.

haddock.libs.libalign.ResCode

The single letter code of a residue.

Unrecognized residues’ code is X.

alias of Literal[‘C’, ‘D’, ‘S’, ‘Q’, ‘K’, ‘I’, ‘P’, ‘T’, ‘F’, ‘N’, ‘G’, ‘H’, ‘L’, ‘R’, ‘W’, ‘A’, ‘V’, ‘E’, ‘Y’, ‘M’, ‘X’]

class haddock.libs.libalign.SeqAlign[source]

Bases: object

SeqAlign class.

postprocess_alignment(ref_ch, mod_ch, align_id)[source]

Postprocess the alignment.

Parameters:

ref_ch (str) – reference chain
mod_ch (str) – model chain
align_id (int) – alignment id (index of the alignment)

haddock.libs.libalign.align_seq(reference, model, output_path)[source]

Sequence align and get the numbering relationship.

Parameters:

reference (PosixPath or haddock.libs.libontology.PDBFile)
model (PosixPath or haddock.libs.libontology.PDBFile)
output_path (Path)

Returns:

align_dic (dict) – dictionary of sequence alignments (one per chain)

haddock.libs.libalign.align_strct(reference: PDBFile, model: PDBFile, output_path: str | Path, lovoalign_exec: str | Path | None = None) → dict[str, dict[int, int]][source]

Structuraly align and get numbering relationship.

Parameters:

reference (haddock.libs.libontology.PDBFile)
model (haddock.libs.libontology.PDBFile)
output_path (Path)
lovoalign_exec (Path) – lovoalign executable

Returns:

numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)

haddock.libs.libalign.calc_rmsd(V: ndarray[tuple[int, ...], dtype[float64]], W: ndarray[tuple[int, ...], dtype[float64]]) → float[source]

Calculate the RMSD from two vectors.

Parameters:

V (np.array dtype=float, shape=(n_atoms,3))
W (np.array dtype=float, shape=(n_atoms,3))

Returns:

rmsd (float)

haddock.libs.libalign.centroid(X: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]][source]

Get the centroid.

Parameters:: X (np.array dtype=float, shape=(n_atoms,3))
Returns:: C (np.array dtype=float, shape=(3,))

haddock.libs.libalign.check_chains(obs_chains, inp_r_chain, inp_l_chains)[source]

Check observed chains against the expected ones.

Logic: if at least one of inp_l_chains is among the observed chains and is not selected as the receptor chain, then ligand_chains is equal to this interesection. Otherwise, ligand_chains becomes equal to all the other chains (once receptor chain is removed).

Parameters:

obs_chains (list) – List of observed chains.
inp_r_chain (str) – Receptor chain.
inp_l_chains (list) – List of ligand chains.

haddock.libs.libalign.check_common_atoms(models, filter_resdic, allatoms, atom_similarity)[source]

Check if the models share the same atoms.

Parameters:

models (list) – list of models
filter_resdic (dict) – dictionary of residues to be loaded (one list per chain)
allatoms (bool) – use all the heavy atoms
atom_similarity (float) – minimum atom similarity required between models

Returns:

n_atoms (int) – number of common atoms
common_keys (list) – list of common atom keys

haddock.libs.libalign.dump_as_izone(fname, numbering_dic, model2ref_chain_dict=None)[source]

Dump the numbering dictionary as .izone.

Parameters:

fname (str) – output filename
numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)

haddock.libs.libalign.get_align(method: str, lovoalign_exec: str | Path) → partial[dict[str, dict[int, int]]][source]

Get the alignment function.

Parameters:

method (str) – Available options: sequence and structure.
lovoalign_exec (str) – Path to the lovoalign executable.

Returns:

align_func (functools.partial) – desired alignment function

haddock.libs.libalign.get_atoms(pdb: PDBFile | Path, full: bool = False) → dict[str, list[str]][source]

Identify what is the molecule type of each PDB.

Parameters:

pdb (PosixPath or haddock.libs.libontology.PDBFile) – PDB file to have its atoms identified
full (bool) – Weather or not to take full atoms into consideration. If False, only main-chain atoms retrieved. If True, all heavy atoms retrieved.

Returns:

atom_dic (dict) – dictionary of atoms

haddock.libs.libalign.kabsch(P: ndarray[tuple[int, ...], dtype[float64]], Q: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]][source]

Find the rotation matrix using Kabsch algorithm.

Parameters:

P (np.array dtype=float, shape=(n_atoms,3))
Q (np.array dtype=float, shape=(n_atoms,3))

Returns:

U (np.array dtype=float, shape=(3,3))

haddock.libs.libalign.load_coords(pdb_f, atoms, filter_resdic=None, numbering_dic=None, model2ref_chain_dict=None, add_resname=None)[source]

Load coordinates from PDB.

Parameters:

pdb_f (PDBFile)
atoms (dict) – dictionary of atoms
filter_resdic (dict) – dictionary of residues to be loaded (one list per chain)
numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)
add_resname (bool) – use the residue name in the identifier

Returns:

coord_dic (dict) – dictionary of coordinates (one per chain)
chain_ranges (dict) – dictionary of chain ranges

haddock.libs.libalign.make_range(chain_range_dic: dict[str, list[int]]) → dict[str, tuple[int, int]][source]

Expand a chain dictionary into ranges.

Parameters:: chain_range_dic (dict) – dictionary of chain indexes (one list per chain)
Returns:: chain_ranges (dict) – dictionary of chain ranges (one tuple per chain)

haddock.libs.libalign.pdb2fastadic(pdb_f: PDBFile | Path) → dict[str, dict[int, str]][source]

Write the sequence as a fasta.

Parameters:: pdb_f (PosixPath or haddock.libs.libontology.PDBFile)
Returns:: seq_dic (dict) – dict of fasta sequences (one per chain)

haddock.libs.libalign.rearrange_xyz_files(output_name: str | Path, path: str | Path, ncores: int) → None[source]

Combine different xyz outputs in a single file.

Parameters:

output_name (FilePath) – output name
path (FilePath) – path to the output files
ncores (int) – number of cores

haddock.libs.libalign.sequence_alignment(seq_ref, seq_model)[source]

Perform a sequence alignment.

Parameters:

seq_ref (str) – reference sequence
seq_model (str) – model sequence

Returns:

identity (float) – sequence identity
top_aln (Bio.Align.PairwiseAlignments) – alignment object
aln_ref_seg (tuple) – aligned reference segment
aln_mod_seg (tuple) – aligned model segment

haddock.libs.libalign.write_alignment(top_aln, output_path, ref_ch)[source]

Write the alignment to a file.

Parameters:

top_aln (Bio.Align.PairwiseAlignments) – alignment object
ref_ch (str) – reference chain