Skip to content

workflow

Workflow steps

WhatRetrieve = Literal['pdbe', 'alphafold'] module-attribute

Types of what to retrieve.

what_retrieve_choices = {'pdbe', 'alphafold'} module-attribute

Set of what can be retrieved.

DensityFilterSessionResult dataclass

Stats of density filtering.

Parameters:

Name Type Description Default
density_filtered_dir Path

The directory where the filtered PDB files are stored.

required
nr_kept int

The number of structures that were kept after filtering.

required
nr_discarded int

The number of structures that were discarded after filtering.

required

density_filter(session_dir, query)

Filter the AlphaFoldDB structures based on density confidence.

In AlphaFold PDB files, the b-factor column has the predicted local distance difference test (pLDDT). All residues with a b-factor above the confidence threshold are counted. Then if the count is outside the min and max threshold, the structure is filtered out. The remaining structures have the residues with a b-factor below the confidence threshold removed. And are written to the session_dir / "density_filtered" directory.

Parameters:

Name Type Description Default
session_dir Path

The directory where the session database is stored.

required
query DensityFilterQuery

The density filter query containing the confidence thresholds.

required

Returns:

Type Description
DensityFilterSessionResult

Stats of density filtering.

prune_pdbs(session_dir)

Prune the PDB files to only keep the first chain of the found Uniprot entries.

And rename that chain to A.

retrieve_structures(session_dir, what=None, what_af_formats=None)

Retrieve structure files from PDBe and AlphaFold databases for the Uniprot entries in the session.

Parameters:

Name Type Description Default
session_dir Path

The directory to store downloaded files and the session database.

required
what set[WhatRetrieve] | None

A tuple of strings indicating which databases to retrieve files from.

None
what_af_formats set[DownloadableFormat] | None

A tuple of formats to download from AlphaFold (e.g., "pdb", "cif").

None

Returns:

Type Description
Path

A tuple containing the download directory, the number of PDBe mmCIF files downloaded,

int

and the number of AlphaFold files downloaded.

search_structures_in_uniprot(query, session_dir, limit=10000)

Searches for protein structures in UniProt database.

Parameters:

Name Type Description Default
query Query

The search query.

required
session_dir Path

The directory to store the search results.

required
limit int

The maximum number of results to return from each database query.

10000

Returns:

Type Description
int

A tuple containing the number of UniProt accessions, the number of PDB structures,

int

and the number of AlphaFold structures found.