libalign: sequence and structural alignments

Library of functions to perform sequence and structural alignments.

Main functions

exception haddock.libs.libalign.ALIGNError(msg: object = '')[source]

Bases: Exception

Raised when something goes wrong with the ALIGNMENT library.

haddock.libs.libalign.ResCode

The single letter code of a residue.

Unrecognized residues’ code is X.

alias of Literal[‘C’, ‘D’, ‘S’, ‘Q’, ‘K’, ‘I’, ‘P’, ‘T’, ‘F’, ‘N’, ‘G’, ‘H’, ‘L’, ‘R’, ‘W’, ‘A’, ‘V’, ‘E’, ‘Y’, ‘M’, ‘X’]

class haddock.libs.libalign.SeqAlign[source]

Bases: object

SeqAlign class.

postprocess_alignment(ref_ch, mod_ch, align_id)[source]

Postprocess the alignment.

Parameters:
  • ref_ch (str) – reference chain

  • mod_ch (str) – model chain

  • align_id (int) – alignment id (index of the alignment)

haddock.libs.libalign.align_seq(reference, model, output_path)[source]

Sequence align and get the numbering relationship.

Parameters:
Returns:

align_dic (dict) – dictionary of sequence alignments (one per chain)

haddock.libs.libalign.align_strct(reference: PDBFile, model: PDBFile, output_path: str | Path, lovoalign_exec: str | Path | None = None) dict[str, dict[int, int]][source]

Structuraly align and get numbering relationship.

Parameters:
Returns:

numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)

haddock.libs.libalign.calc_rmsd(V: ndarray[Any, dtype[float64]], W: ndarray[Any, dtype[float64]]) float[source]

Calculate the RMSD from two vectors.

Parameters:
  • V (np.array dtype=float, shape=(n_atoms,3))

  • W (np.array dtype=float, shape=(n_atoms,3))

Returns:

rmsd (float)

haddock.libs.libalign.centroid(X: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]][source]

Get the centroid.

Parameters:

X (np.array dtype=float, shape=(n_atoms,3))

Returns:

C (np.array dtype=float, shape=(3,))

haddock.libs.libalign.check_common_atoms(models, filter_resdic, allatoms, atom_similarity)[source]

Check if the models share the same atoms.

Parameters:
  • models (list) – list of models

  • filter_resdic (dict) – dictionary of residues to be loaded (one list per chain)

  • allatoms (bool) – use all the heavy atoms

  • atom_similarity (float) – minimum atom similarity required between models

Returns:

  • n_atoms (int) – number of common atoms

  • common_keys (list) – list of common atom keys

haddock.libs.libalign.dump_as_izone(fname, numbering_dic, model2ref_chain_dict=None)[source]

Dump the numbering dictionary as .izone.

Parameters:
  • fname (str) – output filename

  • numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)

haddock.libs.libalign.get_align(method: str, lovoalign_exec: str | Path) partial[dict[str, dict[int, int]]][source]

Get the alignment function.

Parameters:
  • method (str) – Available options: sequence and structure.

  • lovoalign_exec (str) – Path to the lovoalign executable.

Returns:

align_func (functools.partial) – desired alignment function

haddock.libs.libalign.get_atoms(pdb: PDBFile | Path, full: bool = False) dict[str, list[str]][source]

Identify what is the molecule type of each PDB.

Parameters:
  • pdb (PosixPath or haddock.libs.libontology.PDBFile) – PDB file to have its atoms identified

  • full (bool) – Weather or not to take full atoms into consideration. If False, only main-chain atoms retrieved. If True, all heavy atoms retrieved.

Returns:

atom_dic (dict) – dictionary of atoms

haddock.libs.libalign.kabsch(P: ndarray[Any, dtype[float64]], Q: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[float64]][source]

Find the rotation matrix using Kabsch algorithm.

Parameters:
  • P (np.array dtype=float, shape=(n_atoms,3))

  • Q (np.array dtype=float, shape=(n_atoms,3))

Returns:

U (np.array dtype=float, shape=(3,3))

haddock.libs.libalign.load_coords(pdb_f, atoms, filter_resdic=None, numbering_dic=None, model2ref_chain_dict=None, add_resname=None)[source]

Load coordinates from PDB.

Parameters:
  • pdb_f (PDBFile)

  • atoms (dict) – dictionary of atoms

  • filter_resdic (dict) – dictionary of residues to be loaded (one list per chain)

  • numbering_dic (dict) – dict of numbering dictionaries (one dictionary per chain)

  • add_resname (bool) – use the residue name in the identifier

Returns:

  • coord_dic (dict) – dictionary of coordinates (one per chain)

  • chain_ranges (dict) – dictionary of chain ranges

haddock.libs.libalign.make_range(chain_range_dic: dict[str, list[int]]) dict[str, tuple[int, int]][source]

Expand a chain dictionary into ranges.

Parameters:

chain_range_dic (dict) – dictionary of chain indexes (one list per chain)

Returns:

chain_ranges (dict) – dictionary of chain ranges (one tuple per chain)

haddock.libs.libalign.pdb2fastadic(pdb_f: PDBFile | Path) dict[str, dict[int, str]][source]

Write the sequence as a fasta.

Parameters:

pdb_f (PosixPath or haddock.libs.libontology.PDBFile)

Returns:

seq_dic (dict) – dict of fasta sequences (one per chain)

haddock.libs.libalign.rearrange_xyz_files(output_name: str | Path, path: str | Path, ncores: int) None[source]

Combine different xyz outputs in a single file.

Parameters:
  • output_name (FilePath) – output name

  • path (FilePath) – path to the output files

  • ncores (int) – number of cores

haddock.libs.libalign.sequence_alignment(seq_ref, seq_model)[source]

Perform a sequence alignment.

Parameters:
  • seq_ref (str) – reference sequence

  • seq_model (str) – model sequence

Returns:

  • identity (float) – sequence identity

  • top_aln (Bio.Align.PairwiseAlignments) – alignment object

  • aln_ref_seg (tuple) – aligned reference segment

  • aln_mod_seg (tuple) – aligned model segment

haddock.libs.libalign.write_alignment(top_aln, output_path, ref_ch)[source]

Write the alignment to a file.

Parameters:
  • top_aln (Bio.Align.PairwiseAlignments) – alignment object

  • ref_ch (str) – reference chain