uniprot
Module for searching UniProtKB using SPARQL.
ComplexPortalEntry
dataclass
A ComplexPortal entry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query_protein
|
str
|
The UniProt accession used to find entry. |
required |
complex_id
|
str
|
The ComplexPortal identifier (for example "CPX-1234"). |
required |
complex_url
|
str
|
The URL to the ComplexPortal entry. |
required |
complex_title
|
str
|
The title of the complex. |
required |
members
|
set[str]
|
UniProt accessions which are members of the complex. |
required |
PdbResult
dataclass
Result of a PDB search in UniProtKB.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
PDB ID (e.g., "1H3O"). |
required |
method
|
str
|
Method used for the PDB entry (e.g., "X-ray diffraction"). |
required |
uniprot_chains
|
str
|
Chains in UniProt format (e.g., "A/B=1-42,A/B=50-99"). |
required |
resolution
|
str | None
|
Resolution of the PDB entry (e.g., "2.0" for 2.0 Å). Optional. |
None
|
chain
cached
property
The first chain from the UniProt chains aka self.uniprot_chains.
chain_length
cached
property
The length of the chain from the UniProt chains aka self.uniprot_chains.
Query
dataclass
Search query for UniProtKB.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
taxon_id
|
str | None
|
NCBI Taxon ID to filter results by organism (e.g., "9606" for human). |
required |
reviewed
|
bool | None
|
Whether to filter results by reviewed status (True for reviewed, False for unreviewed). |
None
|
subcellular_location_uniprot
|
str | None
|
Subcellular location in UniProt format (e.g., "nucleus"). |
None
|
subcellular_location_go
|
list[str] | None
|
Subcellular location in GO format. Can be a single GO term (e.g., ["GO:0005634"]) or a collection of GO terms (e.g., ["GO:0005634", "GO:0005737"]). |
None
|
molecular_function_go
|
list[str] | None
|
Molecular function in GO format. Can be a single GO term (e.g., ["GO:0003674"]) or a collection of GO terms (e.g., ["GO:0003674", "GO:0008150"]). |
None
|
filter_pdb_results_on_chain_length(pdb_results, min_residues, max_residues)
Filter PDB results based on chain length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pdb_results
|
PdbResults
|
Dictionary with protein IDs as keys and sets of PDB results as values. |
required |
min_residues
|
int | None
|
Minimum number of residues required in the chain mapped to the UniProt accession. If None, no minimum is applied. |
required |
max_residues
|
int | None
|
Maximum number of residues allowed in chain mapped to the UniProt accession. If None, no maximum is applied. |
required |
Returns:
Type | Description |
---|---|
PdbResults
|
Filtered dictionary with protein IDs as keys and sets of PDB results as values. |
search4af(uniprot_accs, limit=10000, timeout=1800, batch_size=10000)
Search for AlphaFold entries in UniProtKB accessions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uniprot_accs
|
Collection[str]
|
UniProt accessions. |
required |
limit
|
int
|
Maximum number of results to return. |
10000
|
timeout
|
int
|
Timeout for the SPARQL query in seconds. |
1800
|
batch_size
|
int
|
Size of batches to process the UniProt accessions. |
10000
|
Returns:
Type | Description |
---|---|
dict[str, set[str]]
|
Dictionary with protein IDs as keys and sets of AlphaFold IDs as values. |
search4emdb(uniprot_accs, limit=10000, timeout=1800)
Search for EMDB entries in UniProtKB accessions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uniprot_accs
|
Iterable[str]
|
UniProt accessions. |
required |
limit
|
int
|
Maximum number of results to return. |
10000
|
timeout
|
int
|
Timeout for the SPARQL query in seconds. |
1800
|
Returns:
Type | Description |
---|---|
dict[str, set[str]]
|
Dictionary with protein IDs as keys and sets of EMDB IDs as values. |
search4interaction_partners(uniprot_acc, excludes=None, limit=10000, timeout=1800)
Search for interaction partners of a given UniProt accession using ComplexPortal database references.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uniprot_acc
|
str
|
UniProt accession to search interaction partners for. |
required |
excludes
|
set[str] | None
|
Set of UniProt accessions to exclude from the results. For example already known interaction partners. If None then no complex members are excluded. |
None
|
limit
|
int
|
Maximum number of results to return. |
10000
|
timeout
|
int
|
Timeout for the SPARQL query in seconds. |
1800
|
Returns:
Type | Description |
---|---|
dict[str, set[str]]
|
Dictionary with UniProt accessions of interaction partners as keys and sets of ComplexPortal entry IDs |
dict[str, set[str]]
|
in which the interaction occurs as values. |
search4macromolecular_complexes(uniprot_accs, limit=10000, timeout=1800)
Search for macromolecular complexes by UniProtKB accessions.
Queries for references to/from https://www.ebi.ac.uk/complexportal/ database in the Uniprot SPARQL endpoint.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uniprot_accs
|
Iterable[str]
|
UniProt accessions. |
required |
limit
|
int
|
Maximum number of results to return. |
10000
|
timeout
|
int
|
Timeout for the SPARQL query in seconds. |
1800
|
Returns:
Type | Description |
---|---|
list[ComplexPortalEntry]
|
List of ComplexPortalEntry objects. |
search4pdb(uniprot_accs, limit=10000, timeout=1800, batch_size=10000)
Search for PDB entries in UniProtKB accessions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uniprot_accs
|
Collection[str]
|
UniProt accessions. |
required |
limit
|
int
|
Maximum number of results to return. |
10000
|
timeout
|
int
|
Timeout for the SPARQL query in seconds. |
1800
|
batch_size
|
int
|
Size of batches to process the UniProt accessions. |
10000
|
Returns:
Type | Description |
---|---|
PdbResults
|
Dictionary with protein IDs as keys and sets of PDB results as values. |
search4uniprot(query, limit=10000, timeout=1800)
Search for UniProtKB entries based on the given query.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
Query
|
Query object containing search parameters. |
required |
limit
|
int
|
Maximum number of results to return. |
10000
|
timeout
|
int
|
Timeout for the SPARQL query in seconds. |
1800
|
Returns:
Type | Description |
---|---|
set[str]
|
Set of uniprot accessions. |