Skip to content

result

Module for reformatting/filtering PDB results from UniProtKB SPARQL.

PdbResults = dict[str, set[PdbResult]]

Dictionary with uniprot accessions as keys and sets of PDB results as values.

PdbChainLengthError

Bases: ValueError

Raised when a UniProt chain description does not yield a chain length.

PdbResult dataclass

Result of a PDB search in UniProtKB.

Parameters:

Name Type Description Default
id str

PDB ID (for example "1H3O").

required
method str

Method used for the PDB entry (for example "X-ray diffraction").

required
uniprot_chains str

Chains in UniProt format (for example "A/B=1-42,A/B=50-99").

required
resolution str | None

Resolution of the PDB entry (for example "2.0" for 2.0 Å). Optional.

None

chain cached property

The first chain from the UniProt chains aka self.uniprot_chains.

chain_length cached property

The length of the chain from the UniProt chains aka self.uniprot_chains.

filter_pdb_results_on_chain_length(pdb_results, min_residues, max_residues, keep_invalid=False)

Filter PDB results based on chain length.

Parameters:

Name Type Description Default
pdb_results PdbResults

Dictionary with protein IDs as keys and sets of PDB results as values.

required
min_residues int | None

Minimum number of residues required in the chain mapped to the UniProt accession. If None, no minimum is applied.

required
max_residues int | None

Maximum number of residues allowed in chain mapped to the UniProt accession. If None, no maximum is applied.

required
keep_invalid bool

If True, PDB results with invalid chain length (could not be determined) are kept. If False, PDB results with invalid chain length are filtered out. Warnings are logged when length can not be determined.

False

Returns:

Type Description
PdbResults

Filtered dictionary with protein IDs as keys and sets of PDB results as values.

filter_pdb_results_on_resolution(pdb_results, top)

Filter PDB results to top entries per UniProt accession by resolution.

Entries are ranked by lower resolution first, then higher chain length, and finally deterministic PDB ID ordering.

Parameters:

Name Type Description Default
pdb_results PdbResults

Dictionary with UniProt accessions mapped to PDB entries.

required
top int

Maximum number of PDB entries to keep for each accession.

required

Returns:

Type Description
PdbResults

Filtered dictionary with top-ranked entries per accession.