CLI Reference
Documentation for the protein-quest
script.
protein-quest --help
Usage: protein-quest [-h] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--version]
{search,retrieve,filter,mcp} ...
Protein Quest CLI
Positional Arguments:
{search,retrieve,filter,mcp}
search Search data sources
retrieve Retrieve structure files
filter Filter files
mcp Run Model Context Protocol (MCP) server
Options:
-h, --help show this help message and exit
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--version show program's version number and exit
search
protein-quest search --help
Usage: protein-quest search [-h] {uniprot,pdbe,alphafold,go} ...
Search various things online.
Positional Arguments:
{uniprot,pdbe,alphafold,go}
uniprot Search UniProt accessions
pdbe Search PDBe structures of given UniProt accessions
alphafold Search AlphaFold structures of given UniProt
accessions
go Search for Gene Ontology (GO) terms
Options:
-h, --help show this help message and exit
search uniprot
protein-quest search uniprot --help
Usage: protein-quest search uniprot [-h] [--taxon-id TAXON_ID]
[--reviewed | --no-reviewed]
[--subcellular-location-uniprot
SUBCELLULAR_LOCATION_UNIPROT]
[--subcellular-location-go
SUBCELLULAR_LOCATION_GO]
[--molecular-function-go
MOLECULAR_FUNCTION_GO]
[--limit LIMIT] [--timeout TIMEOUT]
output
Search for UniProt accessions based on various criteria in the Uniprot SPARQL
endpoint.
Positional Arguments:
output Output text file for UniProt accessions (one per
line). Use `-` for stdout.
Options:
-h, --help show this help message and exit
--taxon-id TAXON_ID NCBI Taxon ID, e.g. 9606 for Homo Sapiens (default:
None)
--reviewed, --no-reviewed
Reviewed=swissprot, no-reviewed=trembl. Default is
uniprot=swissprot+trembl. (default: None)
--subcellular-location-uniprot SUBCELLULAR_LOCATION_UNIPROT
Subcellular location label as used by UniProt (e.g.
nucleus) (default: None)
--subcellular-location-go SUBCELLULAR_LOCATION_GO
GO term(s) for subcellular location (e.g. GO:0005634).
Can be given multiple times. (default: None)
--molecular-function-go MOLECULAR_FUNCTION_GO
GO term(s) for molecular function (e.g. GO:0003677).
Can be given multiple times. (default: None)
--limit LIMIT Maximum number of uniprot accessions to return
(default: 10000)
--timeout TIMEOUT Maximum seconds to wait for query to complete
(default: 1800)
search pdbe
protein-quest search pdbe --help
Usage: protein-quest search pdbe [-h] [--limit LIMIT] [--timeout TIMEOUT]
uniprot_accs output_csv
Search for PDB structures of given UniProt accessions in the Uniprot SPARQL
endpoint.
Positional Arguments:
uniprot_accs Text file with UniProt accessions (one per line). Use `-`
for stdin.
output_csv Output CSV with `uniprot_acc`, `pdb_id`, `method`,
`resolution`, `uniprot_chains`, `chain` columns. Where
`uniprot_chains` is the raw UniProt chain string, for
example `A=1-100`. and where `chain` is the first chain
from `uniprot_chains`, for example `A`. Use `-` for
stdout.
Options:
-h, --help show this help message and exit
--limit LIMIT Maximum number of PDB uniprot accessions combinations to
return (default: 10000)
--timeout TIMEOUT Maximum seconds to wait for query to complete (default:
1800)
search alphafold
protein-quest search alphafold --help
Usage: protein-quest search alphafold [-h] [--limit LIMIT] [--timeout TIMEOUT]
uniprot_accs output_csv
Search for AlphaFold structures of given UniProt accessions in the Uniprot
SPARQL endpoint.
Positional Arguments:
uniprot_accs Text file with UniProt accessions (one per line). Use `-`
for stdin.
output_csv Output CSV with AlphaFold IDs per UniProt accession. Use
`-` for stdout.
Options:
-h, --help show this help message and exit
--limit LIMIT Maximum number of Alphafold entry identifiers to return
(default: 10000)
--timeout TIMEOUT Maximum seconds to wait for query to complete (default:
1800)
search go
protein-quest search go --help
Usage: protein-quest search go [-h]
[--aspect
{molecular_function,biological_process,cellular_component}]
[--limit LIMIT]
term output_csv
Search for Gene Ontology (GO) terms in the EBI QuickGO API.
Positional Arguments:
term GO term to search for. For example `apoptosome`.
output_csv Output CSV with GO term results. Use `-` for stdout.
Options:
-h, --help show this help message and exit
--aspect {molecular_function,biological_process,cellular_component}
Filter on aspect. (default: None)
--limit LIMIT Maximum number of GO term results to return (default:
100)
retrieve
protein-quest retrieve --help
Usage: protein-quest retrieve [-h] {pdbe,alphafold} ...
Retrieve structure files from online resources.
Positional Arguments:
{pdbe,alphafold}
pdbe Retrieve PDBe gzipped mmCIF files for PDB IDs in CSV.
alphafold Retrieve AlphaFold files for IDs in CSV
Options:
-h, --help show this help message and exit
retrieve pdbe
protein-quest retrieve pdbe --help
Usage: protein-quest retrieve pdbe [-h]
[--max-parallel-downloads
MAX_PARALLEL_DOWNLOADS]
pdbe_csv output_dir
Retrieve mmCIF files from Protein Data Bank in Europe Knowledge Base (PDBe)
website for unique PDB IDs listed in a CSV file.
Positional Arguments:
pdbe_csv CSV file with `pdb_id` column. Other columns are
ignored. Use `-` for stdin.
output_dir Directory to store downloaded PDBe mmCIF files
Options:
-h, --help show this help message and exit
--max-parallel-downloads MAX_PARALLEL_DOWNLOADS
Maximum number of parallel downloads (default: 5)
retrieve alphafold
protein-quest retrieve alphafold --help
Usage: protein-quest retrieve alphafold [-h]
[--what-af-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,paeDoc,paeImage,pdb}
]
[--max-parallel-downloads
MAX_PARALLEL_DOWNLOADS]
alphafold_csv output_dir
Retrieve AlphaFold files from the AlphaFold Protein Structure Database.
Positional Arguments:
alphafold_csv CSV file with `af_id` column. Other columns are
ignored. Use `-` for stdin.
output_dir Directory to store downloaded AlphaFold files
Options:
-h, --help show this help message and exit
--what-af-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,paeDoc,paeImage,pdb}
AlphaFold formats to retrieve. Can be specified
multiple times. Default is 'pdb'. Summary is always
downloaded as `<entryId>.json`. (default: None)
--max-parallel-downloads MAX_PARALLEL_DOWNLOADS
Maximum number of parallel downloads (default: 5)
filter
protein-quest filter --help
Usage: protein-quest filter [-h] {confidence,chain,residue} ...
Positional Arguments:
{confidence,chain,residue}
confidence Filter AlphaFold mmcif/PDB files by confidence
chain Filter on chain.
residue Filter PDB/mmCIF files by number of residues in chain
A
Options:
-h, --help show this help message and exit
filter confidence
protein-quest filter confidence --help
Usage: protein-quest filter confidence [-h]
[--confidence-threshold
CONFIDENCE_THRESHOLD]
[--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES]
[--write-stats WRITE_STATS]
input_dir output_dir
Filter AlphaFold mmcif/PDB files by confidence (plDDT). Passed files are
written with residues below threshold removed.
Positional Arguments:
input_dir Directory with AlphaFold mmcif/PDB files
output_dir Directory to write filtered mmcif/PDB files
Options:
-h, --help show this help message and exit
--confidence-threshold CONFIDENCE_THRESHOLD
pLDDT confidence threshold (0-100) (default: 70)
--min-residues MIN_RESIDUES
Minimum number of high-confidence residues a structure
should have (default: 0)
--max-residues MAX_RESIDUES
Maximum number of high-confidence residues a structure
should have (default: 10000000)
--write-stats WRITE_STATS
Write filter statistics to file. In CSV format with
`<input_file>,<residue_count>,<passed>,<output_file>`
columns. Use `-` for stdout. (default: None)
filter chain
protein-quest filter chain --help
Usage: protein-quest filter chain [-h] [--scheduler-address SCHEDULER_ADDRESS]
chains input_dir output_dir
For each input PDB/mmCIF and chain combination write a PDB/mmCIF file with
just the given chain and rename it to chain `A`. Filtering is done in parallel
using a Dask cluster.
Positional Arguments:
chains CSV file with `pdb_id` and `chain` columns. Other
columns are ignored.
input_dir Directory with PDB/mmCIF files. Expected filenames are
`{pdb_id}.cif.gz`, `{pdb_id}.cif`, `{pdb_id}.pdb.gz`
or `{pdb_id}.pdb`.
output_dir Directory to write the single-chain PDB/mmCIF files.
Output files are in same format as input files.
Options:
-h, --help show this help message and exit
--scheduler-address SCHEDULER_ADDRESS
Address of the Dask scheduler to connect to. If not
provided, will create a local cluster. (default: None)
filter residue
protein-quest filter residue --help
Usage: protein-quest filter residue [-h] [--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES]
[--write-stats WRITE_STATS]
input_dir output_dir
Filter PDB/mmCIF files by number of residues in chain A.
Positional Arguments:
input_dir Directory with PDB/mmCIF files (e.g., from 'filter
chain')
output_dir Directory to write filtered PDB/mmCIF files. Files are
copied without modification.
Options:
-h, --help show this help message and exit
--min-residues MIN_RESIDUES
Min residues in chain A (default: 0)
--max-residues MAX_RESIDUES
Max residues in chain A (default: 10000000)
--write-stats WRITE_STATS
Write filter statistics to file. In CSV format with
`<input_file>,<residue_count>,<passed>,<output_file>`
columns. Use `-` for stdout. (default: None)
mcp
protein-quest mcp --help
Usage: protein-quest mcp [-h] [--transport {stdio,http,streamable-http}]
[--host HOST] [--port PORT]
Run Model Context Protocol (MCP) server. Can be used by agentic LLMs like
Claude Sonnet 4 as a set of tools.
Options:
-h, --help show this help message and exit
--transport {stdio,http,streamable-http}
Transport protocol to use (default: stdio)
--host HOST Host to bind the server to (default: 127.0.0.1)
--port PORT Port to bind the server to (default: 8000)