Skip to content

filter

Module dealing with filtering of protein structures.

In protein_quest package the filters are more granular, here we combine them into coarse grained methods.

FilterOptions dataclass

Filter query containing confidence and secondary structure filters.

Parameters:

Name Type Description Default
confidence ConfidenceFilterQuery

The confidence filter query.

required
secondary_structure SecondaryStructureFilterQuery

The secondary structure filter query.

required

FilteredStructure dataclass

Filter result of a single uniprot+[pdb] entry.

Parameters:

Name Type Description Default
uniprot_accession str

The UniProt accession.

required
pdb_id str | None

The PDB ID if applicable.

None
confidence ConfidenceFilterResult | None

The confidence filter result if applicable.

None
chain ChainFilterStatistics | None

The chain filter result if applicable.

None
residue ResidueFilterStatistics | None

The residue filter result if applicable.

None
secondary_structure tuple[Path, SecondaryStructureFilterResult, Path | None] | None

A tuple containing: - The input file path for the secondary structure filter. - The secondary structure filter result. - The output file path for the secondary structure filter, if passed.

None

output_file property writable

Get the output file of the last filter that was applied

Only valid if the structure passed all filters.

passed property

Whether the structure passed all filters.

make_relative_to(session_dir)

Make all file paths relative to the given session directory.

Parameters:

Name Type Description Default
session_dir Path

The session directory to make paths relative to.

required

Returns:

Type Description
FilteredStructure

A new FilterResultRow object with paths made relative to the session directory.

filter_alphafold_structures(afs, session_dir, options, final_dir)

Filter AlphaFold structures in the session directory based on confidence and secondary structure.

Parameters:

Name Type Description Default
afs list[AlphaFoldEntry]

The list of AlphaFold entries to filter.

required
session_dir Path

The directory containing the session data, including AlphaFold structure files.

required
options FilterOptions

The filter options containing confidence and secondary structure filter queries.

required
final_dir Path

The directory to store the final filtered structures.

required

Returns:

Type Description
FilterResults

A dictionary mapping (uniprot_accession, pdb_id) to FilteredStructure objects

Raises:

Type Description
ValueError

If there are inconsistencies in the filtering results.

filter_pdbe_structures(proteinpdbs, session_dir, options, final_dir, scheduler_address)

Filter PDBe structures in the session directory based on chain, number of residues, and secondary structure.

Parameters:

Name Type Description Default
proteinpdbs list[ProteinPdbRow]

The list of ProteinPdbRow entries to filter.

required
session_dir Path

The directory containing the session data, including PDBe structure files.

required
options FilterOptions

The filter options containing confidence and secondary structure filter queries.

required
final_dir Path

The directory to store the final filtered structures.

required
scheduler_address str | Cluster | None

Address of the Dask scheduler for distributed filtering. If None then local cluster is used.

required

Returns:

Type Description
FilterResults

A dictionary mapping (uniprot_accession, pdb_id) to FilteredStructure objects