filter
Module dealing with filtering of protein structures.
In protein_quest package the filters are more granular, here we combine them into coarse grained methods.
FilterOptions
dataclass
Filter query containing confidence and secondary structure filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
confidence
|
ConfidenceFilterQuery
|
The confidence filter query. |
required |
secondary_structure
|
SecondaryStructureFilterQuery
|
The secondary structure filter query. |
required |
FilteredStructure
dataclass
Filter result of a single uniprot+[pdb] entry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accession
|
str
|
The UniProt accession. |
required |
pdb_id
|
str | None
|
The PDB ID if applicable. |
None
|
confidence
|
ConfidenceFilterResult | None
|
The confidence filter result if applicable. |
None
|
chain
|
ChainFilterStatistics | None
|
The chain filter result if applicable. |
None
|
residue
|
ResidueFilterStatistics | None
|
The residue filter result if applicable. |
None
|
secondary_structure
|
tuple[Path, SecondaryStructureFilterResult, Path | None] | None
|
A tuple containing: - The input file path for the secondary structure filter. - The secondary structure filter result. - The output file path for the secondary structure filter, if passed. |
None
|
output_file
property
writable
Get the output file of the last filter that was applied
Only valid if the structure passed all filters.
passed
property
Whether the structure passed all filters.
make_relative_to(session_dir)
Make all file paths relative to the given session directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session_dir
|
Path
|
The session directory to make paths relative to. |
required |
Returns:
| Type | Description |
|---|---|
FilteredStructure
|
A new FilterResultRow object with paths made relative to the session directory. |
filter_alphafold_structures(afs, session_dir, options, final_dir)
Filter AlphaFold structures in the session directory based on confidence and secondary structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
afs
|
list[AlphaFoldEntry]
|
The list of AlphaFold entries to filter. |
required |
session_dir
|
Path
|
The directory containing the session data, including AlphaFold structure files. |
required |
options
|
FilterOptions
|
The filter options containing confidence and secondary structure filter queries. |
required |
final_dir
|
Path
|
The directory to store the final filtered structures. |
required |
Returns:
| Type | Description |
|---|---|
FilterResults
|
A dictionary mapping (uniprot_accession, pdb_id) to FilteredStructure objects |
Raises:
| Type | Description |
|---|---|
ValueError
|
If there are inconsistencies in the filtering results. |
filter_pdbe_structures(proteinpdbs, session_dir, options, final_dir, scheduler_address)
Filter PDBe structures in the session directory based on chain, number of residues, and secondary structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
proteinpdbs
|
list[ProteinPdbRow]
|
The list of ProteinPdbRow entries to filter. |
required |
session_dir
|
Path
|
The directory containing the session data, including PDBe structure files. |
required |
options
|
FilterOptions
|
The filter options containing confidence and secondary structure filter queries. |
required |
final_dir
|
Path
|
The directory to store the final filtered structures. |
required |
scheduler_address
|
str | Cluster | None
|
Address of the Dask scheduler for distributed filtering. If None then local cluster is used. |
required |
Returns:
| Type | Description |
|---|---|
FilterResults
|
A dictionary mapping (uniprot_accession, pdb_id) to FilteredStructure objects |