workflow
Workflow steps
WhatRetrieve = Literal['pdbe', 'alphafold']
module-attribute
Types of what to retrieve.
what_retrieve_choices = {'pdbe', 'alphafold'}
module-attribute
Set of what can be retrieved.
UniprotSearchResult
dataclass
Result of a UniProt search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nr_uniprot_accessions
|
int
|
Number of UniProt accessions found. |
required |
nr_pdbs
|
int
|
Number of PDB structures found. |
required |
nr_prot2pdb
|
int
|
Number of UniProt to PDB mappings found. |
required |
nr_afs
|
int
|
Number of AlphaFold structures found. |
required |
nr_interaction_partners
|
int
|
Number of interaction partners found. |
required |
async_retrieve_structures(session_dir, what=None, what_af_formats=None)
async
Retrieve structure files from PDBe and AlphaFold databases for the Uniprot entries in the session asynchronously.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session_dir
|
Path
|
The directory to store downloaded files and the session database. |
required |
what
|
set[WhatRetrieve] | None
|
A set of strings indicating which databases to retrieve files from ("pdbe", "alphafold"). |
None
|
what_af_formats
|
set[DownloadableFormat] | None
|
A set of formats to download from AlphaFold (e.g., "pdb", "cif"). If None, defaults to {"summary", "cif"}. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Path, int, int]
|
A tuple containing: - The download directory (Path) - The number of PDBe mmCIF files downloaded (int) - The number of AlphaFold files downloaded (int) |
filter_structures(session_dir, options, scheduler_address=None)
Filter the structures in the session based on confidence, number of residues, and secondary structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session_dir
|
Path
|
The directory containing the session data, including structure files. |
required |
options
|
FilterOptions
|
The filter options containing confidence and secondary structure filter queries. |
required |
scheduler_address
|
str | Cluster | None
|
Address of the Dask scheduler for distributed filtering. If None then a local cluster is used. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Path, list[FilteredStructure]]
|
A tuple containing: - The directory with the filtered structures. - A list of FilteredStructure objects containing the filtering results for each structure. |
retrieve_structures(session_dir, what=None, what_af_formats=None)
Retrieve structure files from PDBe and AlphaFold databases for the Uniprot entries in the session.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session_dir
|
Path
|
The directory to store downloaded files and the session database. |
required |
what
|
set[WhatRetrieve] | None
|
A tuple of strings indicating which databases to retrieve files from. |
None
|
what_af_formats
|
set[DownloadableFormat] | None
|
A tuple of formats to download from AlphaFold (e.g., "pdb", "cif"). |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
A tuple containing the download directory, the number of PDBe mmCIF files downloaded, |
int
|
and the number of AlphaFold files downloaded. |
search_structures_in_uniprot(query, session_dir, limit=10000)
Searches for protein structures in UniProt database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
UniprotQuery
|
The search query. |
required |
session_dir
|
Path
|
The directory to store the search results. |
required |
limit
|
int
|
The maximum number of results to return from each database query. |
10000
|
Returns:
| Type | Description |
|---|---|
UniprotSearchResult
|
A tuple containing the number of UniProt accessions, the number of PDB structures, |
UniprotSearchResult
|
number of UniProt to PDB mappings, |
UniprotSearchResult
|
and the number of AlphaFold structures found. |