Skip to content

workflow

Workflow steps

WhatRetrieve = Literal['pdbe', 'alphafold'] module-attribute

Types of what to retrieve.

what_retrieve_choices = {'pdbe', 'alphafold'} module-attribute

Set of what can be retrieved.

UniprotSearchResult dataclass

Result of a UniProt search.

Parameters:

Name Type Description Default
nr_uniprot_accessions int

Number of UniProt accessions found.

required
nr_pdbs int

Number of PDB structures found.

required
nr_prot2pdb int

Number of UniProt to PDB mappings found.

required
nr_afs int

Number of AlphaFold structures found.

required
nr_interaction_partners int

Number of interaction partners found.

required

async_retrieve_structures(session_dir, what=None, what_af_formats=None) async

Retrieve structure files from PDBe and AlphaFold databases for the Uniprot entries in the session asynchronously.

Parameters:

Name Type Description Default
session_dir Path

The directory to store downloaded files and the session database.

required
what set[WhatRetrieve] | None

A set of strings indicating which databases to retrieve files from ("pdbe", "alphafold").

None
what_af_formats set[DownloadableFormat] | None

A set of formats to download from AlphaFold (e.g., "pdb", "cif"). If None, defaults to {"summary", "cif"}.

None

Returns:

Type Description
tuple[Path, int, int]

A tuple containing: - The download directory (Path) - The number of PDBe mmCIF files downloaded (int) - The number of AlphaFold files downloaded (int)

filter_structures(session_dir, options, scheduler_address=None)

Filter the structures in the session based on confidence, number of residues, and secondary structure.

Parameters:

Name Type Description Default
session_dir Path

The directory containing the session data, including structure files.

required
options FilterOptions

The filter options containing confidence and secondary structure filter queries.

required
scheduler_address str | Cluster | None

Address of the Dask scheduler for distributed filtering. If None then a local cluster is used.

None

Returns:

Type Description
tuple[Path, list[FilteredStructure]]

A tuple containing: - The directory with the filtered structures. - A list of FilteredStructure objects containing the filtering results for each structure.

retrieve_structures(session_dir, what=None, what_af_formats=None)

Retrieve structure files from PDBe and AlphaFold databases for the Uniprot entries in the session.

Parameters:

Name Type Description Default
session_dir Path

The directory to store downloaded files and the session database.

required
what set[WhatRetrieve] | None

A tuple of strings indicating which databases to retrieve files from.

None
what_af_formats set[DownloadableFormat] | None

A tuple of formats to download from AlphaFold (e.g., "pdb", "cif").

None

Returns:

Type Description
Path

A tuple containing the download directory, the number of PDBe mmCIF files downloaded,

int

and the number of AlphaFold files downloaded.

search_structures_in_uniprot(query, session_dir, limit=10000)

Searches for protein structures in UniProt database.

Parameters:

Name Type Description Default
query UniprotQuery

The search query.

required
session_dir Path

The directory to store the search results.

required
limit int

The maximum number of results to return from each database query.

10000

Returns:

Type Description
UniprotSearchResult

A tuple containing the number of UniProt accessions, the number of PDB structures,

UniprotSearchResult

number of UniProt to PDB mappings,

UniprotSearchResult

and the number of AlphaFold structures found.