workflow
Workflow steps
WhatRetrieve = Literal['pdbe', 'alphafold']
module-attribute
Types of what to retrieve.
what_retrieve_choices = {'pdbe', 'alphafold'}
module-attribute
Set of what can be retrieved.
DensityFilterSessionResult
dataclass
Stats of density filtering.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
density_filtered_dir
|
Path
|
The directory where the filtered PDB files are stored. |
required |
nr_kept
|
int
|
The number of structures that were kept after filtering. |
required |
nr_discarded
|
int
|
The number of structures that were discarded after filtering. |
required |
density_filter(session_dir, query)
Filter the AlphaFoldDB structures based on density confidence.
In AlphaFold PDB files, the b-factor column has the predicted local distance difference test (pLDDT). All residues with a b-factor above the confidence threshold are counted. Then if the count is outside the min and max threshold, the structure is filtered out. The remaining structures have the residues with a b-factor below the confidence threshold removed. And are written to the session_dir / "density_filtered" directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
session_dir
|
Path
|
The directory where the session database is stored. |
required |
query
|
DensityFilterQuery
|
The density filter query containing the confidence thresholds. |
required |
Returns:
Type | Description |
---|---|
DensityFilterSessionResult
|
Stats of density filtering. |
prune_pdbs(session_dir)
Prune the PDB files to only keep the first chain of the found Uniprot entries.
And rename that chain to A.
retrieve_structures(session_dir, what=None, what_af_formats=None)
Retrieve structure files from PDBe and AlphaFold databases for the Uniprot entries in the session.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
session_dir
|
Path
|
The directory to store downloaded files and the session database. |
required |
what
|
set[WhatRetrieve] | None
|
A tuple of strings indicating which databases to retrieve files from. |
None
|
what_af_formats
|
set[DownloadableFormat] | None
|
A tuple of formats to download from AlphaFold (e.g., "pdb", "cif"). |
None
|
Returns:
Type | Description |
---|---|
Path
|
A tuple containing the download directory, the number of PDBe mmCIF files downloaded, |
int
|
and the number of AlphaFold files downloaded. |
search_structures_in_uniprot(query, session_dir, limit=10000)
Searches for protein structures in UniProt database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
Query
|
The search query. |
required |
session_dir
|
Path
|
The directory to store the search results. |
required |
limit
|
int
|
The maximum number of results to return from each database query. |
10000
|
Returns:
Type | Description |
---|---|
int
|
A tuple containing the number of UniProt accessions, the number of PDB structures, |
int
|
and the number of AlphaFold structures found. |