CLI Reference
Documentation for the protein-quest script.
protein-quest --help
Usage: protein-quest [-h] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--version]
{search,retrieve,filter,convert,mcp} ...
Protein Quest CLI
Positional Arguments:
{search,retrieve,filter,convert,mcp}
search Search data sources
retrieve Retrieve structure files
filter Filter files
convert Convert files between formats
mcp Run Model Context Protocol (MCP) server
Options:
-h, --help show this help message and exit
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--version show program's version number and exit
search
protein-quest search --help
Usage: protein-quest search [-h]
{uniprot,pdbe,alphafold,emdb,go,taxonomy,interaction
-partners,complexes} ...
Search various things online.
Positional Arguments:
{uniprot,pdbe,alphafold,emdb,go,taxonomy,interaction-partners,complexes}
uniprot Search UniProt accessions
pdbe Search PDBe structures of given UniProt accessions
alphafold Search AlphaFold structures of given UniProt
accessions
emdb Search Electron Microscopy Data Bank (EMDB)
identifiers of given UniProt accessions
go Search for Gene Ontology (GO) terms
taxonomy Search for taxon information in UniProt
interaction-partners
Search for interaction partners of given UniProt
accession
complexes Search for complexes in the Complex Portal
Options:
-h, --help show this help message and exit
search uniprot
protein-quest search uniprot --help
Usage: protein-quest search uniprot [-h] [--taxon-id TAXON_ID]
[--reviewed | --no-reviewed]
[--subcellular-location-uniprot
SUBCELLULAR_LOCATION_UNIPROT]
[--subcellular-location-go
SUBCELLULAR_LOCATION_GO]
[--molecular-function-go
MOLECULAR_FUNCTION_GO]
[--limit LIMIT] [--timeout TIMEOUT]
output
Search for UniProt accessions based on various criteria in the Uniprot SPARQL
endpoint.
Positional Arguments:
output Output text file for UniProt accessions (one per
line). Use `-` for stdout.
Options:
-h, --help show this help message and exit
--taxon-id TAXON_ID NCBI Taxon ID, e.g. 9606 for Homo Sapiens (default:
None)
--reviewed, --no-reviewed
Reviewed=swissprot, no-reviewed=trembl. Default is
uniprot=swissprot+trembl. (default: None)
--subcellular-location-uniprot SUBCELLULAR_LOCATION_UNIPROT
Subcellular location label as used by UniProt (e.g.
nucleus) (default: None)
--subcellular-location-go SUBCELLULAR_LOCATION_GO
GO term(s) for subcellular location (e.g. GO:0005634).
Can be given multiple times. (default: None)
--molecular-function-go MOLECULAR_FUNCTION_GO
GO term(s) for molecular function (e.g. GO:0003677).
Can be given multiple times. (default: None)
--limit LIMIT Maximum number of uniprot accessions to return
(default: 10000)
--timeout TIMEOUT Maximum seconds to wait for query to complete
(default: 1800)
search pdbe
protein-quest search pdbe --help
Usage: protein-quest search pdbe [-h] [--limit LIMIT]
[--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES]
[--timeout TIMEOUT]
uniprot_accs output_csv
Search for PDB structures of given UniProt accessions in the Uniprot SPARQL
endpoint.
Positional Arguments:
uniprot_accs Text file with UniProt accessions (one per line). Use
`-` for stdin.
output_csv Output CSV with following columns: `uniprot_acc`,
`pdb_id`, `method`, `resolution`, `uniprot_chains`,
`chain`, `chain_length`. Where `uniprot_chains` is the
raw UniProt chain string, for example `A=1-100`. and
where `chain` is the first chain from
`uniprot_chains`, for example `A` and `chain_length`
is the length of the chain, for example `100`. Use `-`
for stdout.
Options:
-h, --help show this help message and exit
--limit LIMIT Maximum number of PDB uniprot accessions combinations
to return (default: 10000)
--min-residues MIN_RESIDUES
Minimum number of residues required in the chain
mapped to the UniProt accession. (default: None)
--max-residues MAX_RESIDUES
Maximum number of residues allowed in chain mapped to
the UniProt accession. (default: None)
--timeout TIMEOUT Maximum seconds to wait for query to complete
(default: 1800)
search alphafold
protein-quest search alphafold --help
Usage: protein-quest search alphafold [-h] [--limit LIMIT] [--timeout TIMEOUT]
uniprot_accs output_csv
Search for AlphaFold structures of given UniProt accessions in the Uniprot
SPARQL endpoint.
Positional Arguments:
uniprot_accs Text file with UniProt accessions (one per line). Use `-`
for stdin.
output_csv Output CSV with AlphaFold IDs per UniProt accession. Use
`-` for stdout.
Options:
-h, --help show this help message and exit
--limit LIMIT Maximum number of Alphafold entry identifiers to return
(default: 10000)
--timeout TIMEOUT Maximum seconds to wait for query to complete (default:
1800)
search emdb
protein-quest search emdb --help
Usage: protein-quest search emdb [-h] [--limit LIMIT] [--timeout TIMEOUT]
uniprot_accs output_csv
Search for Electron Microscopy Data Bank (EMDB) identifiers of given UniProt
accessions in the Uniprot SPARQL endpoint.
Positional Arguments:
uniprot_accs Text file with UniProt accessions (one per line). Use `-`
for stdin.
output_csv Output CSV with EMDB IDs per UniProt accession. Use `-`
for stdout.
Options:
-h, --help show this help message and exit
--limit LIMIT Maximum number of EMDB entry identifiers to return
(default: 10000)
--timeout TIMEOUT Maximum seconds to wait for query to complete (default:
1800)
search go
protein-quest search go --help
Usage: protein-quest search go [-h]
[--aspect
{cellular_component,molecular_function,biological_process}]
[--limit LIMIT]
term output_csv
Search for Gene Ontology (GO) terms in the EBI QuickGO API.
Positional Arguments:
term GO term to search for. For example `apoptosome`.
output_csv Output CSV with GO term results. Use `-` for stdout.
Options:
-h, --help show this help message and exit
--aspect {cellular_component,molecular_function,biological_process}
Filter on aspect. (default: None)
--limit LIMIT Maximum number of GO term results to return (default:
100)
search taxonomy
protein-quest search taxonomy --help
Usage: protein-quest search taxonomy [-h]
[--field
{None,common,tax_id,parent,scientific}]
[--limit LIMIT]
query output_csv
Search for taxon information in UniProt. Uses
https://www.uniprot.org/taxonomy?query=*.
Positional Arguments:
query Search query for the taxon. Surround multiple words
with quotes (' or ").
output_csv Output CSV with taxonomy results. Use `-` for stdout.
Options:
-h, --help show this help message and exit
--field {None,common,tax_id,parent,scientific}
Field to search in. If not given then searches all
fields. If "tax_id" then searches by taxon ID. If
"parent" then given a parent taxon ID returns all its
children. For example, if the parent taxon ID is 9606
(Human), it will return Neanderthal and Denisovan.
(default: None)
--limit LIMIT Maximum number of results to return (default: 100)
search interaction-partners
protein-quest search interaction-partners --help
Usage: protein-quest search interaction-partners [-h] [--exclude EXCLUDE]
[--limit LIMIT]
[--timeout TIMEOUT]
uniprot_acc output_csv
Search for interaction partners of given UniProt accession in the Uniprot
SPARQL endpoint and Complex Portal.
Positional Arguments:
uniprot_acc UniProt accession (for example P12345).
output_csv Output CSV with interaction partners per UniProt
accession. Use `-` for stdout.
Options:
-h, --help show this help message and exit
--exclude EXCLUDE UniProt accessions to exclude from the results. For
example already known interaction partners. (default:
None)
--limit LIMIT Maximum number of interaction partner uniprot accessions
to return (default: 10000)
--timeout TIMEOUT Maximum seconds to wait for query to complete (default:
1800)
search complexes
protein-quest search complexes --help
Usage: protein-quest search complexes [-h] [--limit LIMIT] [--timeout TIMEOUT]
uniprot_accs output_csv
Search for complexes in the Complex Portal.
https://www.ebi.ac.uk/complexportal/
The output CSV file has the following columns:
• query_protein: UniProt accession used as query
• complex_id: Complex Portal identifier
• complex_url: URL to the Complex Portal entry
• complex_title: Title of the complex
• members: Semicolon-separated list of UniProt accessions of complex members
Positional Arguments:
uniprot_accs Text file with UniProt accessions (one per line) as query
for searching complexes. Use `-` for stdin.
output_csv Output CSV file with complex results. Use `-` for stdout.
Options:
-h, --help show this help message and exit
--limit LIMIT Maximum number of complex results to return (default:
100)
--timeout TIMEOUT Maximum seconds to wait for query to complete (default:
1800)
retrieve
protein-quest retrieve --help
Usage: protein-quest retrieve [-h] {pdbe,alphafold,emdb} ...
Retrieve structure files from online resources.
Positional Arguments:
{pdbe,alphafold,emdb}
pdbe Retrieve PDBe gzipped mmCIF files for PDB IDs in CSV.
alphafold Retrieve AlphaFold files for IDs in CSV
emdb Retrieve Electron Microscopy Data Bank (EMDB) gzipped
3D volume files for EMDB IDs in CSV.
Options:
-h, --help show this help message and exit
retrieve pdbe
protein-quest retrieve pdbe --help
Usage: protein-quest retrieve pdbe [-h]
[--max-parallel-downloads
MAX_PARALLEL_DOWNLOADS]
[--no-cache] [--cache-dir CACHE_DIR]
[--copy-method {symlink,hardlink,copy}]
pdbe_csv output_dir
Retrieve mmCIF files from Protein Data Bank in Europe Knowledge Base (PDBe)
website for unique PDB IDs listed in a CSV file.
Positional Arguments:
pdbe_csv CSV file with `pdb_id` column. Other columns are
ignored. Use `-` for stdin.
output_dir Directory to store downloaded PDBe mmCIF files
Options:
-h, --help show this help message and exit
--max-parallel-downloads MAX_PARALLEL_DOWNLOADS
Maximum number of parallel downloads (default: 5)
--no-cache Disable caching of files to central location.
(default: False)
--cache-dir CACHE_DIR
Directory to use as cache for files. (default:
/home/runner/.cache/protein-quest)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
retrieve alphafold
protein-quest retrieve alphafold --help
Usage: protein-quest retrieve alphafold [-h]
[--what-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,paeDoc,paeImage,pdb,
summary}]
[--gzip-files]
[--max-parallel-downloads
MAX_PARALLEL_DOWNLOADS]
[--no-cache] [--cache-dir CACHE_DIR]
[--copy-method {symlink,hardlink,copy}]
alphafold_csv output_dir
Retrieve AlphaFold files from the AlphaFold Protein Structure Database.
Positional Arguments:
alphafold_csv CSV file with `af_id` column. Other columns are
ignored. Use `-` for stdin.
output_dir Directory to store downloaded AlphaFold files
Options:
-h, --help show this help message and exit
--what-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,paeDoc,paeImage,pdb,
summary}
AlphaFold formats to retrieve. Can be specified
multiple times. Default is 'summary' and 'cif'.
(default: None)
--gzip-files Whether to gzip the downloaded files. Excludes summary
files, they are always uncompressed. (default: False)
--max-parallel-downloads MAX_PARALLEL_DOWNLOADS
Maximum number of parallel downloads (default: 5)
--no-cache Disable caching of files to central location.
(default: False)
--cache-dir CACHE_DIR
Directory to use as cache for files. (default:
/home/runner/.cache/protein-quest)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
retrieve emdb
protein-quest retrieve emdb --help
Usage: protein-quest retrieve emdb [-h] [--no-cache] [--cache-dir CACHE_DIR]
[--copy-method {symlink,hardlink,copy}]
emdb_csv output_dir
Retrieve volume files from Electron Microscopy Data Bank (EMDB) website for
unique EMDB IDs listed in a CSV file.
Positional Arguments:
emdb_csv CSV file with `emdb_id` column. Other columns are
ignored. Use `-` for stdin.
output_dir Directory to store downloaded EMDB volume files
Options:
-h, --help show this help message and exit
--no-cache Disable caching of files to central location.
(default: False)
--cache-dir CACHE_DIR
Directory to use as cache for files. (default:
/home/runner/.cache/protein-quest)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
filter
protein-quest filter --help
Usage: protein-quest filter [-h]
{confidence,chain,residue,secondary-structure} ...
Positional Arguments:
{confidence,chain,residue,secondary-structure}
confidence Filter AlphaFold mmcif/PDB files by confidence
chain Filter on chain.
residue Filter PDB/mmCIF files by number of residues in chain
A
secondary-structure
Filter PDB/mmCIF files by secondary structure
Options:
-h, --help show this help message and exit
filter confidence
protein-quest filter confidence --help
Usage: protein-quest filter confidence [-h]
[--confidence-threshold
CONFIDENCE_THRESHOLD]
[--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES]
[--write-stats WRITE_STATS]
[--copy-method {symlink,hardlink,copy}]
input_dir output_dir
Filter AlphaFold mmcif/PDB files by confidence (plDDT). Passed files are
written with residues below threshold removed.
Positional Arguments:
input_dir Directory with AlphaFold mmcif/PDB files
output_dir Directory to write filtered mmcif/PDB files
Options:
-h, --help show this help message and exit
--confidence-threshold CONFIDENCE_THRESHOLD
pLDDT confidence threshold (0-100) (default: 70)
--min-residues MIN_RESIDUES
Minimum number of high-confidence residues a structure
should have (default: 0)
--max-residues MAX_RESIDUES
Maximum number of high-confidence residues a structure
should have (default: 10000000)
--write-stats WRITE_STATS
Write filter statistics to file. In CSV format with
`<input_file>,<residue_count>,<passed>,<output_file>`
columns. Use `-` for stdout. (default: None)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
filter chain
protein-quest filter chain --help
Usage: protein-quest filter chain [-h] [--scheduler-address SCHEDULER_ADDRESS]
[--copy-method {symlink,hardlink,copy}]
chains input_dir output_dir
For each input PDB/mmCIF and chain combination write a PDB/mmCIF file with
just the given chain and rename it to chain `A`. Filtering is done in parallel
using a Dask cluster.
Positional Arguments:
chains CSV file with `pdb_id` and `chain` columns. Other
columns are ignored.
input_dir Directory with PDB/mmCIF files. Expected filenames are
`{pdb_id}.cif.gz`, `{pdb_id}.cif`, `{pdb_id}.pdb.gz`
or `{pdb_id}.pdb`.
output_dir Directory to write the single-chain PDB/mmCIF files.
Output files are in same format as input files.
Options:
-h, --help show this help message and exit
--scheduler-address SCHEDULER_ADDRESS
Address of the Dask scheduler to connect to. If not
provided, will create a local cluster. If set to
`sequential` will run tasks sequentially. (default:
None)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
filter residue
protein-quest filter residue --help
Usage: protein-quest filter residue [-h] [--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES]
[--write-stats WRITE_STATS]
[--copy-method {symlink,hardlink,copy}]
input_dir output_dir
Filter PDB/mmCIF files by number of residues in chain A.
Positional Arguments:
input_dir Directory with PDB/mmCIF files (e.g., from 'filter
chain')
output_dir Directory to write filtered PDB/mmCIF files. Files are
copied without modification.
Options:
-h, --help show this help message and exit
--min-residues MIN_RESIDUES
Min residues in chain A (default: 0)
--max-residues MAX_RESIDUES
Max residues in chain A (default: 10000000)
--write-stats WRITE_STATS
Write filter statistics to file. In CSV format with
`<input_file>,<residue_count>,<passed>,<output_file>`
columns. Use `-` for stdout. (default: None)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
filter secondary-structure
protein-quest filter secondary-structure --help
Usage: protein-quest filter secondary-structure [-h]
[--abs-min-helix-residues
ABS_MIN_HELIX_RESIDUES]
[--abs-max-helix-residues
ABS_MAX_HELIX_RESIDUES]
[--abs-min-sheet-residues
ABS_MIN_SHEET_RESIDUES]
[--abs-max-sheet-residues
ABS_MAX_SHEET_RESIDUES]
[--ratio-min-helix-residues
RATIO_MIN_HELIX_RESIDUES]
[--ratio-max-helix-residues
RATIO_MAX_HELIX_RESIDUES]
[--ratio-min-sheet-residues
RATIO_MIN_SHEET_RESIDUES]
[--ratio-max-sheet-residues
RATIO_MAX_SHEET_RESIDUES]
[--write-stats WRITE_STATS]
[--copy-method
{symlink,hardlink,copy}]
input_dir output_dir
Filter PDB/mmCIF files by secondary structure
Positional Arguments:
input_dir Directory with PDB/mmCIF files (e.g., from 'filter
chain')
output_dir Directory to write filtered PDB/mmCIF files. Files are
copied without modification.
Options:
-h, --help show this help message and exit
--abs-min-helix-residues ABS_MIN_HELIX_RESIDUES
Min residues in helices (default: None)
--abs-max-helix-residues ABS_MAX_HELIX_RESIDUES
Max residues in helices (default: None)
--abs-min-sheet-residues ABS_MIN_SHEET_RESIDUES
Min residues in sheets (default: None)
--abs-max-sheet-residues ABS_MAX_SHEET_RESIDUES
Max residues in sheets (default: None)
--ratio-min-helix-residues RATIO_MIN_HELIX_RESIDUES
Min residues in helices (relative) (default: None)
--ratio-max-helix-residues RATIO_MAX_HELIX_RESIDUES
Max residues in helices (relative) (default: None)
--ratio-min-sheet-residues RATIO_MIN_SHEET_RESIDUES
Min residues in sheets (relative) (default: None)
--ratio-max-sheet-residues RATIO_MAX_SHEET_RESIDUES
Max residues in sheets (relative) (default: None)
--write-stats WRITE_STATS
Write filter statistics to file. In CSV format with
columns:
`<input_file>,<nr_residues>,<nr_helix_residues>,<nr_sh
eet_residues>,
<helix_ratio>,<sheet_ratio>,<passed>,<output_file>`.
Use `-` for stdout. (default: None)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
convert
protein-quest convert --help
Usage: protein-quest convert [-h] {structures,uniprot} ...
Positional Arguments:
{structures,uniprot}
structures Convert structure files between formats
uniprot Convert structure files to list of UniProt accessions.
Options:
-h, --help show this help message and exit
convert structures
protein-quest convert structures --help
Usage: protein-quest convert structures [-h] [--output-dir OUTPUT_DIR]
[--format {cif}]
[--copy-method {symlink,hardlink,copy}]
input_dir
Positional Arguments:
input_dir Directory with structure files. Supported extensions
are {'.ent.gz', '.bcif.gz', '.pdb', '.pdb.gz',
'.bcif', '.cif', '.ent', '.cif.gz'}
Options:
-h, --help show this help message and exit
--output-dir OUTPUT_DIR
Directory to write converted structure files. If not
given, files are written to `input_dir`. (default:
None)
--format {cif} Output format to convert to. (default: cif)
--copy-method {symlink,hardlink,copy}
How to make target file be same file as source file.
By default uses hardlinks to save disk space. Note
that hardlinks only work within the same filesystem
and are harder to track. If you want to track cached
files easily then use 'symlink'. On Windows you need
developer mode or admin privileges to create symlinks.
(default: hardlink)
convert uniprot
protein-quest convert uniprot --help
Usage: protein-quest convert uniprot [-h] [--grouped] input_dir output
Convert structure files to list of UniProt accessions. Uniprot accessions are
read from database reference of each structure.
Positional Arguments:
input_dir Directory with structure files. Supported extensions are
{'.ent.gz', '.bcif.gz', '.pdb', '.pdb.gz', '.bcif', '.cif',
'.ent', '.cif.gz'}
output Output text file with UniProt accessions (one per line). Use '-'
for stdout.
Options:
-h, --help show this help message and exit
--grouped Whether to group accessions by structure file. If set output
changes to `<structure_file1>,<acc1>\n<structure_file1>,<acc2>`
format. (default: False)
mcp
protein-quest mcp --help
Usage: protein-quest mcp [-h] [--transport {stdio,http,streamable-http}]
[--host HOST] [--port PORT]
Run Model Context Protocol (MCP) server. Can be used by agentic LLMs like
Claude Sonnet 4 as a set of tools.
Options:
-h, --help show this help message and exit
--transport {stdio,http,streamable-http}
Transport protocol to use (default: stdio)
--host HOST Host to bind the server to (default: 127.0.0.1)
--port PORT Port to bind the server to (default: 8000)