Skip to content

uniprot

Module for searching UniProtKB using SPARQL.

ComplexPortalEntry dataclass

A ComplexPortal entry.

Parameters:

Name Type Description Default
query_protein str

The UniProt accession used to find entry.

required
complex_id str

The ComplexPortal identifier (for example "CPX-1234").

required
complex_url str

The URL to the ComplexPortal entry.

required
complex_title str

The title of the complex.

required
members set[str]

UniProt accessions which are members of the complex.

required

PdbResult dataclass

Result of a PDB search in UniProtKB.

Parameters:

Name Type Description Default
id str

PDB ID (e.g., "1H3O").

required
method str

Method used for the PDB entry (e.g., "X-ray diffraction").

required
uniprot_chains str

Chains in UniProt format (e.g., "A/B=1-42,A/B=50-99").

required
resolution str | None

Resolution of the PDB entry (e.g., "2.0" for 2.0 Å). Optional.

None

chain cached property

The first chain from the UniProt chains aka self.uniprot_chains.

chain_length cached property

The length of the chain from the UniProt chains aka self.uniprot_chains.

Query dataclass

Search query for UniProtKB.

Parameters:

Name Type Description Default
taxon_id str | None

NCBI Taxon ID to filter results by organism (e.g., "9606" for human).

required
reviewed bool | None

Whether to filter results by reviewed status (True for reviewed, False for unreviewed).

None
subcellular_location_uniprot str | None

Subcellular location in UniProt format (e.g., "nucleus").

None
subcellular_location_go list[str] | None

Subcellular location in GO format. Can be a single GO term (e.g., ["GO:0005634"]) or a collection of GO terms (e.g., ["GO:0005634", "GO:0005737"]).

None
molecular_function_go list[str] | None

Molecular function in GO format. Can be a single GO term (e.g., ["GO:0003674"]) or a collection of GO terms (e.g., ["GO:0003674", "GO:0008150"]).

None

filter_pdb_results_on_chain_length(pdb_results, min_residues, max_residues)

Filter PDB results based on chain length.

Parameters:

Name Type Description Default
pdb_results PdbResults

Dictionary with protein IDs as keys and sets of PDB results as values.

required
min_residues int | None

Minimum number of residues required in the chain mapped to the UniProt accession. If None, no minimum is applied.

required
max_residues int | None

Maximum number of residues allowed in chain mapped to the UniProt accession. If None, no maximum is applied.

required

Returns:

Type Description
PdbResults

Filtered dictionary with protein IDs as keys and sets of PDB results as values.

search4af(uniprot_accs, limit=10000, timeout=1800, batch_size=10000)

Search for AlphaFold entries in UniProtKB accessions.

Parameters:

Name Type Description Default
uniprot_accs Collection[str]

UniProt accessions.

required
limit int

Maximum number of results to return.

10000
timeout int

Timeout for the SPARQL query in seconds.

1800
batch_size int

Size of batches to process the UniProt accessions.

10000

Returns:

Type Description
dict[str, set[str]]

Dictionary with protein IDs as keys and sets of AlphaFold IDs as values.

search4emdb(uniprot_accs, limit=10000, timeout=1800)

Search for EMDB entries in UniProtKB accessions.

Parameters:

Name Type Description Default
uniprot_accs Iterable[str]

UniProt accessions.

required
limit int

Maximum number of results to return.

10000
timeout int

Timeout for the SPARQL query in seconds.

1800

Returns:

Type Description
dict[str, set[str]]

Dictionary with protein IDs as keys and sets of EMDB IDs as values.

search4interaction_partners(uniprot_acc, excludes=None, limit=10000, timeout=1800)

Search for interaction partners of a given UniProt accession using ComplexPortal database references.

Parameters:

Name Type Description Default
uniprot_acc str

UniProt accession to search interaction partners for.

required
excludes set[str] | None

Set of UniProt accessions to exclude from the results. For example already known interaction partners. If None then no complex members are excluded.

None
limit int

Maximum number of results to return.

10000
timeout int

Timeout for the SPARQL query in seconds.

1800

Returns:

Type Description
dict[str, set[str]]

Dictionary with UniProt accessions of interaction partners as keys and sets of ComplexPortal entry IDs

dict[str, set[str]]

in which the interaction occurs as values.

search4macromolecular_complexes(uniprot_accs, limit=10000, timeout=1800)

Search for macromolecular complexes by UniProtKB accessions.

Queries for references to/from https://www.ebi.ac.uk/complexportal/ database in the Uniprot SPARQL endpoint.

Parameters:

Name Type Description Default
uniprot_accs Iterable[str]

UniProt accessions.

required
limit int

Maximum number of results to return.

10000
timeout int

Timeout for the SPARQL query in seconds.

1800

Returns:

Type Description
list[ComplexPortalEntry]

List of ComplexPortalEntry objects.

search4pdb(uniprot_accs, limit=10000, timeout=1800, batch_size=10000)

Search for PDB entries in UniProtKB accessions.

Parameters:

Name Type Description Default
uniprot_accs Collection[str]

UniProt accessions.

required
limit int

Maximum number of results to return.

10000
timeout int

Timeout for the SPARQL query in seconds.

1800
batch_size int

Size of batches to process the UniProt accessions.

10000

Returns:

Type Description
PdbResults

Dictionary with protein IDs as keys and sets of PDB results as values.

search4uniprot(query, limit=10000, timeout=1800)

Search for UniProtKB entries based on the given query.

Parameters:

Name Type Description Default
query Query

Query object containing search parameters.

required
limit int

Maximum number of results to return.

10000
timeout int

Timeout for the SPARQL query in seconds.

1800

Returns:

Type Description
set[str]

Set of uniprot accessions.