search
Search subcommands for protein-quest.
alphafold(uniprot_accessions, output_csv, /, *, min_sequence_length=None, max_sequence_length=None, limit=10000, timeout=1800, _=None)
Search for AlphaFold structures of given UniProt accessions.
Search for AlphaFold structures of given UniProt accessions in the Uniprot SPARQL endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accessions
|
InputFile
|
Text file with UniProt accessions (one per line). Use |
required |
output_csv
|
OutputFile
|
Output CSV with AlphaFold IDs per UniProt accession.
CSV has columns: |
required |
min_sequence_length
|
MinSequenceLength | None
|
Minimum length of the canonical sequence. |
None
|
max_sequence_length
|
MaxSequenceLength | None
|
Maximum length of the canonical sequence. |
None
|
limit
|
Limit
|
Maximum number of Alphafold entry identifiers to return. |
10000
|
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
_
|
Common | None
|
Common CLI options. |
None
|
complexes(uniprot_accessions, output_csv, /, *, limit=100, timeout=1800, _=None)
Search for complexes in the Complex Portal.
Search for complexes in the Complex Portal (https://www.ebi.ac.uk/complexportal/).
The output CSV file has the following columns:
- query_protein: UniProt accession used as query
- complex_id: Complex Portal identifier
- complex_url: URL to the Complex Portal entry
- complex_title: Title of the complex
- members: Semicolon-separated list of UniProt accessions of complex members
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accessions
|
InputFile
|
Text file with UniProt accessions (one per line) as query. Use |
required |
output_csv
|
OutputFile
|
Output CSV file with complex results. Use |
required |
limit
|
Limit
|
Maximum number of complex results to return. |
100
|
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
_
|
Common | None
|
Common CLI options. |
None
|
emdb(uniprot_accessions, output_csv, /, *, limit=10000, timeout=1800, _=None)
Search for EMDB identifiers of given UniProt accessions.
Search for Electron Microscopy Data Bank (EMDB) identifiers of given UniProt accessions in the Uniprot SPARQL endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accessions
|
InputFile
|
Text file with UniProt accessions (one per line). Use |
required |
output_csv
|
OutputFile
|
Output CSV with EMDB IDs per UniProt accession.
CSV has columns: |
required |
limit
|
Limit
|
Maximum number of EMDB entry identifiers to return. |
10000
|
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
_
|
Common | None
|
Common CLI options. |
None
|
go(term, output_csv, /, *, aspect=None, limit=100, _=None)
Search for Gene Ontology (GO) terms.
Search for Gene Ontology (GO) terms in the EBI QuickGO API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
term
|
str
|
GO term to search for. For example |
required |
output_csv
|
OutputFile
|
Output CSV with GO term results.
CSV has columns: |
required |
aspect
|
Aspect | None
|
Filter on aspect. |
None
|
limit
|
Limit
|
Maximum number of GO term results to return. |
100
|
_
|
Common | None
|
Common CLI options. |
None
|
interaction_partners(uniprot_accession, output_csv, /, *, exclude=None, limit=10000, timeout=1800, _=None)
Search for interaction partners of given UniProt accession.
Search for interaction partners of given UniProt accession in the Uniprot SPARQL endpoint and Complex Portal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accession
|
str
|
UniProt accession (for example P12345). |
required |
output_csv
|
OutputFile
|
Output CSV with interaction partners per UniProt accession.
CSV has columns: |
required |
exclude
|
Annotated[list[str] | None, Parameter(negative='')]
|
UniProt accessions to exclude from the results.
Multiple accessions can be given by repeating the |
None
|
limit
|
Limit
|
Maximum number of interaction partner uniprot accessions to return. |
10000
|
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
_
|
Common | None
|
Common CLI options. |
None
|
pdbe(uniprot_accessions, output_csv, /, *, limit=10000, timeout=1800, min_residues=None, max_residues=None, keep_invalid=False, top_resolution_per_uniprot_accession=None, _=None)
Search for PDB structures of given UniProt accessions.
Search for PDB structures of given UniProt accessions in the Uniprot SPARQL endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accessions
|
InputFile
|
Text file with UniProt accessions (one per line). Use |
required |
output_csv
|
OutputFile
|
Output CSV with following columns:
|
required |
limit
|
Limit
|
Maximum number of PDB uniprot accessions combinations to return. |
10000
|
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
min_residues
|
MinResidues | None
|
Minimum number of residues required in the chain mapped to the UniProt accession. |
None
|
max_residues
|
MaxResidues | None
|
Maximum number of residues allowed in chain mapped to the UniProt accession. |
None
|
keep_invalid
|
Annotated[bool, Parameter(negative='')]
|
Keep PDB results when chain length could not be determined. |
False
|
top_resolution_per_uniprot_accession
|
PositiveInt | None
|
Retain the top N PDB entries per UniProt accession,
ranked by best (lowest) resolution first, then by highest residue count.
For example use |
None
|
_
|
Common | None
|
Common CLI options. |
None
|
structure(uniprot_accessions, output_csv, /, *, source=None, min_residues=None, max_residues=None, limit=10000, timeout=1800, raw=None, _=None)
Search for experimentally determined and predicted structures.
Search for experimentally determined and predicted structures of given UniProt accessions in the 3D Beacons Network API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accessions
|
InputFile
|
Text file with UniProt accessions (one per line). Use |
required |
output_csv
|
OutputFile
|
Output CSV with following columns:
|
required |
source
|
Annotated[set[Provider | Literal['all']] | None, Parameter(negative='')]
|
Source of the structures to search for. Default |
None
|
min_residues
|
MinResidues | None
|
Minimum number of residues required in the chain mapped to the UniProt accession. |
None
|
max_residues
|
MaxResidues | None
|
Maximum number of residues allowed in the chain mapped to the UniProt accession. |
None
|
limit
|
Limit
|
Maximum number of structures per uniprot accession per source to return. |
10000
|
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
raw
|
OutputFile | None
|
Path to write raw 3D beacon summaries as JSON. |
None
|
_
|
Common | None
|
Common CLI options. |
None
|
taxonomy(query, output_csv, /, *, field=None, limit=100, _=None)
Search for taxon information in UniProt.
Search for taxon information in UniProt. Uses https://www.uniprot.org/taxonomy?query=*.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query for the taxon. Surround multiple words with quotes. |
required |
output_csv
|
OutputFile
|
Output CSV with taxonomy results.
CSV has columns: |
required |
field
|
SearchField | None
|
Field to search in. If not given then searches all fields. If "tax_id" then searches by taxon ID. If "parent" then given a parent taxon ID returns all its children. |
None
|
limit
|
Limit
|
Maximum number of results to return. |
100
|
_
|
Common | None
|
Common CLI options. |
None
|
uniprot(output, /, *, query=None, limit=10000, timeout=1800, _=None)
Search for UniProt accessions.
Search for UniProt accessions based on various criteria in the Uniprot SPARQL endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
OutputFile
|
Output text file for UniProt accessions (one per line). Use |
required |
query
|
Query | None
|
Search query for UniProtKB. Can be given as a JSON string or as a path to a JSON file. |
None
|
limit
|
Limit
|
Maximum number of uniprot accessions to return. |
10000
|
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
_
|
Common | None
|
Common CLI options. |
None
|
uniprot_details(uniprot_accessions, output_csv, /, *, timeout=1800, batch_size=1000, _=None)
Search for UniProt details for given UniProt accessions from the UniProt SPARQL endpoint.
The output CSV file has the following columns:
- uniprot_accession: UniProt accession.
- uniprot_id: UniProt ID (mnemonic).
- sequence_length: Length of the canonical sequence.
- reviewed: Whether the entry is reviewed (Swiss-Prot) or unreviewed (TrEMBL).
- protein_name: Recommended protein name.
- taxon_id: NCBI Taxonomy ID of the organism.
- taxon_name: Scientific name of the organism.
The order of the output CSV can be different from the input order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_accessions
|
InputFile
|
Text file with UniProt accessions (one per line). Use |
required |
output_csv
|
OutputFile
|
Output CSV with UniProt details.
CSV has columns: |
required |
timeout
|
Timeout
|
Maximum seconds to wait for query to complete. |
1800
|
batch_size
|
BatchSize
|
Number of accessions to query per batch. |
1000
|
_
|
Common | None
|
Common CLI options. |
None
|
write_csv(path)
Context manager for writing CSV files.
Creates parent directories if they do not exist.
Yields:
| Type | Description |
|---|---|
|
CSV writer object to write rows to. |