Skip to content

search

Search subcommands for protein-quest.

alphafold(uniprot_accessions, output_csv, /, *, min_sequence_length=None, max_sequence_length=None, limit=10000, timeout=1800, _=None)

Search for AlphaFold structures of given UniProt accessions.

Search for AlphaFold structures of given UniProt accessions in the Uniprot SPARQL endpoint.

Parameters:

Name Type Description Default
uniprot_accessions InputFile

Text file with UniProt accessions (one per line). Use - for stdin.

required
output_csv OutputFile

Output CSV with AlphaFold IDs per UniProt accession. CSV has columns: uniprot_accession, af_id. Use - for stdout.

required
min_sequence_length MinSequenceLength | None

Minimum length of the canonical sequence.

None
max_sequence_length MaxSequenceLength | None

Maximum length of the canonical sequence.

None
limit Limit

Maximum number of Alphafold entry identifiers to return.

10000
timeout Timeout

Maximum seconds to wait for query to complete.

1800
_ Common | None

Common CLI options.

None

complexes(uniprot_accessions, output_csv, /, *, limit=100, timeout=1800, _=None)

Search for complexes in the Complex Portal.

Search for complexes in the Complex Portal (https://www.ebi.ac.uk/complexportal/).

The output CSV file has the following columns:

  • query_protein: UniProt accession used as query
  • complex_id: Complex Portal identifier
  • complex_url: URL to the Complex Portal entry
  • complex_title: Title of the complex
  • members: Semicolon-separated list of UniProt accessions of complex members

Parameters:

Name Type Description Default
uniprot_accessions InputFile

Text file with UniProt accessions (one per line) as query. Use - for stdin.

required
output_csv OutputFile

Output CSV file with complex results. Use - for stdout.

required
limit Limit

Maximum number of complex results to return.

100
timeout Timeout

Maximum seconds to wait for query to complete.

1800
_ Common | None

Common CLI options.

None

emdb(uniprot_accessions, output_csv, /, *, limit=10000, timeout=1800, _=None)

Search for EMDB identifiers of given UniProt accessions.

Search for Electron Microscopy Data Bank (EMDB) identifiers of given UniProt accessions in the Uniprot SPARQL endpoint.

Parameters:

Name Type Description Default
uniprot_accessions InputFile

Text file with UniProt accessions (one per line). Use - for stdin.

required
output_csv OutputFile

Output CSV with EMDB IDs per UniProt accession. CSV has columns: uniprot_accession, emdb_id. Use - for stdout.

required
limit Limit

Maximum number of EMDB entry identifiers to return.

10000
timeout Timeout

Maximum seconds to wait for query to complete.

1800
_ Common | None

Common CLI options.

None

go(term, output_csv, /, *, aspect=None, limit=100, _=None)

Search for Gene Ontology (GO) terms.

Search for Gene Ontology (GO) terms in the EBI QuickGO API.

Parameters:

Name Type Description Default
term str

GO term to search for. For example apoptosome.

required
output_csv OutputFile

Output CSV with GO term results. CSV has columns: term, id, name, aspect, definition. Use - for stdout.

required
aspect Aspect | None

Filter on aspect.

None
limit Limit

Maximum number of GO term results to return.

100
_ Common | None

Common CLI options.

None

interaction_partners(uniprot_accession, output_csv, /, *, exclude=None, limit=10000, timeout=1800, _=None)

Search for interaction partners of given UniProt accession.

Search for interaction partners of given UniProt accession in the Uniprot SPARQL endpoint and Complex Portal.

Parameters:

Name Type Description Default
uniprot_accession str

UniProt accession (for example P12345).

required
output_csv OutputFile

Output CSV with interaction partners per UniProt accession. CSV has columns: uniprot_accession. Use - for stdout.

required
exclude Annotated[list[str] | None, Parameter(negative='')]

UniProt accessions to exclude from the results. Multiple accessions can be given by repeating the --exclude option.

None
limit Limit

Maximum number of interaction partner uniprot accessions to return.

10000
timeout Timeout

Maximum seconds to wait for query to complete.

1800
_ Common | None

Common CLI options.

None

pdbe(uniprot_accessions, output_csv, /, *, limit=10000, timeout=1800, min_residues=None, max_residues=None, keep_invalid=False, top_resolution_per_uniprot_accession=None, _=None)

Search for PDB structures of given UniProt accessions.

Search for PDB structures of given UniProt accessions in the Uniprot SPARQL endpoint.

Parameters:

Name Type Description Default
uniprot_accessions InputFile

Text file with UniProt accessions (one per line). Use - for stdin.

required
output_csv OutputFile

Output CSV with following columns: uniprot_accession, pdb_id, method, resolution, uniprot_chains, chain, chain_length. Where uniprot_chains is the raw UniProt chain string, for example A=1-100. And where chain is the first chain from uniprot_chains, for example A. And chain_length is the length of the chain, for example 100 or '' if it could not be determined. Use - for stdout.

required
limit Limit

Maximum number of PDB uniprot accessions combinations to return.

10000
timeout Timeout

Maximum seconds to wait for query to complete.

1800
min_residues MinResidues | None

Minimum number of residues required in the chain mapped to the UniProt accession.

None
max_residues MaxResidues | None

Maximum number of residues allowed in chain mapped to the UniProt accession.

None
keep_invalid Annotated[bool, Parameter(negative='')]

Keep PDB results when chain length could not be determined.

False
top_resolution_per_uniprot_accession PositiveInt | None

Retain the top N PDB entries per UniProt accession, ranked by best (lowest) resolution first, then by highest residue count. For example use --top-resolution-per-uniprot-accession 3 to keep only the best 3 PDB entries per UniProt accession.

None
_ Common | None

Common CLI options.

None

structure(uniprot_accessions, output_csv, /, *, source=None, min_residues=None, max_residues=None, limit=10000, timeout=1800, raw=None, _=None)

Search for experimentally determined and predicted structures.

Search for experimentally determined and predicted structures of given UniProt accessions in the 3D Beacons Network API.

Parameters:

Name Type Description Default
uniprot_accessions InputFile

Text file with UniProt accessions (one per line). Use - for stdin.

required
output_csv OutputFile

Output CSV with following columns: uniprot_accession, provider, model_identifier, model_url, model_format, chain, residue_count. Use - for stdout.

required
source Annotated[set[Provider | Literal['all']] | None, Parameter(negative='')]

Source of the structures to search for. Default pdbe and alphafold. Multiple sources can be given by repeating the --source parameter. Use 'all' to search all sources.

None
min_residues MinResidues | None

Minimum number of residues required in the chain mapped to the UniProt accession.

None
max_residues MaxResidues | None

Maximum number of residues allowed in the chain mapped to the UniProt accession.

None
limit Limit

Maximum number of structures per uniprot accession per source to return.

10000
timeout Timeout

Maximum seconds to wait for query to complete.

1800
raw OutputFile | None

Path to write raw 3D beacon summaries as JSON.

None
_ Common | None

Common CLI options.

None

taxonomy(query, output_csv, /, *, field=None, limit=100, _=None)

Search for taxon information in UniProt.

Search for taxon information in UniProt. Uses https://www.uniprot.org/taxonomy?query=*.

Parameters:

Name Type Description Default
query str

Search query for the taxon. Surround multiple words with quotes.

required
output_csv OutputFile

Output CSV with taxonomy results. CSV has columns: tax_id, name, rank, parent_tax_id, parent_tax_name. Use - for stdout.

required
field SearchField | None

Field to search in. If not given then searches all fields. If "tax_id" then searches by taxon ID. If "parent" then given a parent taxon ID returns all its children.

None
limit Limit

Maximum number of results to return.

100
_ Common | None

Common CLI options.

None

uniprot(output, /, *, query=None, limit=10000, timeout=1800, _=None)

Search for UniProt accessions.

Search for UniProt accessions based on various criteria in the Uniprot SPARQL endpoint.

Parameters:

Name Type Description Default
output OutputFile

Output text file for UniProt accessions (one per line). Use - for stdout.

required
query Query | None

Search query for UniProtKB. Can be given as a JSON string or as a path to a JSON file.

None
limit Limit

Maximum number of uniprot accessions to return.

10000
timeout Timeout

Maximum seconds to wait for query to complete.

1800
_ Common | None

Common CLI options.

None

uniprot_details(uniprot_accessions, output_csv, /, *, timeout=1800, batch_size=1000, _=None)

Search for UniProt details for given UniProt accessions from the UniProt SPARQL endpoint.

The output CSV file has the following columns:

  • uniprot_accession: UniProt accession.
  • uniprot_id: UniProt ID (mnemonic).
  • sequence_length: Length of the canonical sequence.
  • reviewed: Whether the entry is reviewed (Swiss-Prot) or unreviewed (TrEMBL).
  • protein_name: Recommended protein name.
  • taxon_id: NCBI Taxonomy ID of the organism.
  • taxon_name: Scientific name of the organism.

The order of the output CSV can be different from the input order.

Parameters:

Name Type Description Default
uniprot_accessions InputFile

Text file with UniProt accessions (one per line). Use - for stdin.

required
output_csv OutputFile

Output CSV with UniProt details. CSV has columns: uniprot_accession, uniprot_id, sequence_length, reviewed, protein_name, taxon_id, taxon_name. Use - for stdout.

required
timeout Timeout

Maximum seconds to wait for query to complete.

1800
batch_size BatchSize

Number of accessions to query per batch.

1000
_ Common | None

Common CLI options.

None

write_csv(path)

Context manager for writing CSV files.

Creates parent directories if they do not exist.

Yields:

Type Description

CSV writer object to write rows to.

write_taxonomy_csv(taxons, output_csv)

Write taxon information to a CSV file.

Parameters:

Name Type Description Default
taxons list[Taxon]

List of Taxon objects to write to the CSV file.

required
output_csv StdioPath

File object for the output CSV file. Can be a file path or '-' for stdout.

required