Skip to content

CLI Reference

Documentation for the protein-quest script.

protein-quest --help


Usage: protein-quest [-h] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                     [--version]
                     {search,retrieve,filter,mcp} ...

Protein Quest CLI

Positional Arguments:
  {search,retrieve,filter,mcp}
    search              Search data sources
    retrieve            Retrieve structure files
    filter              Filter files
    mcp                 Run Model Context Protocol (MCP) server

Options:
  -h, --help            show this help message and exit
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
  --version             show program's version number and exit


protein-quest search --help

Usage: protein-quest search [-h] {uniprot,pdbe,alphafold,go} ...

Search various things online.

Positional Arguments:
  {uniprot,pdbe,alphafold,go}
    uniprot             Search UniProt accessions
    pdbe                Search PDBe structures of given UniProt accessions
    alphafold           Search AlphaFold structures of given UniProt
                        accessions
    go                  Search for Gene Ontology (GO) terms

Options:
  -h, --help            show this help message and exit


search uniprot

protein-quest search uniprot --help

Usage: protein-quest search uniprot [-h] [--taxon-id TAXON_ID]
                                    [--reviewed | --no-reviewed]
                                    [--subcellular-location-uniprot 
SUBCELLULAR_LOCATION_UNIPROT]
                                    [--subcellular-location-go 
SUBCELLULAR_LOCATION_GO]
                                    [--molecular-function-go 
MOLECULAR_FUNCTION_GO]
                                    [--limit LIMIT] [--timeout TIMEOUT]
                                    output

Search for UniProt accessions based on various criteria in the Uniprot SPARQL
endpoint.

Positional Arguments:
  output                Output text file for UniProt accessions (one per
                        line). Use `-` for stdout.

Options:
  -h, --help            show this help message and exit
  --taxon-id TAXON_ID   NCBI Taxon ID, e.g. 9606 for Homo Sapiens (default:
                        None)
  --reviewed, --no-reviewed
                        Reviewed=swissprot, no-reviewed=trembl. Default is
                        uniprot=swissprot+trembl. (default: None)
  --subcellular-location-uniprot SUBCELLULAR_LOCATION_UNIPROT
                        Subcellular location label as used by UniProt (e.g.
                        nucleus) (default: None)
  --subcellular-location-go SUBCELLULAR_LOCATION_GO
                        GO term(s) for subcellular location (e.g. GO:0005634).
                        Can be given multiple times. (default: None)
  --molecular-function-go MOLECULAR_FUNCTION_GO
                        GO term(s) for molecular function (e.g. GO:0003677).
                        Can be given multiple times. (default: None)
  --limit LIMIT         Maximum number of uniprot accessions to return
                        (default: 10000)
  --timeout TIMEOUT     Maximum seconds to wait for query to complete
                        (default: 1800)


search pdbe

protein-quest search pdbe --help

Usage: protein-quest search pdbe [-h] [--limit LIMIT] [--timeout TIMEOUT]
                                 uniprot_accs output_csv

Search for PDB structures of given UniProt accessions in the Uniprot SPARQL
endpoint.

Positional Arguments:
  uniprot_accs       Text file with UniProt accessions (one per line). Use `-`
                     for stdin.
  output_csv         Output CSV with `uniprot_acc`, `pdb_id`, `method`,
                     `resolution`, `uniprot_chains`, `chain` columns. Where
                     `uniprot_chains` is the raw UniProt chain string, for
                     example `A=1-100`. and where `chain` is the first chain
                     from `uniprot_chains`, for example `A`. Use `-` for
                     stdout.

Options:
  -h, --help         show this help message and exit
  --limit LIMIT      Maximum number of PDB uniprot accessions combinations to
                     return (default: 10000)
  --timeout TIMEOUT  Maximum seconds to wait for query to complete (default:
                     1800)


search alphafold

protein-quest search alphafold --help

Usage: protein-quest search alphafold [-h] [--limit LIMIT] [--timeout TIMEOUT]
                                      uniprot_accs output_csv

Search for AlphaFold structures of given UniProt accessions in the Uniprot
SPARQL endpoint.

Positional Arguments:
  uniprot_accs       Text file with UniProt accessions (one per line). Use `-`
                     for stdin.
  output_csv         Output CSV with AlphaFold IDs per UniProt accession. Use
                     `-` for stdout.

Options:
  -h, --help         show this help message and exit
  --limit LIMIT      Maximum number of Alphafold entry identifiers to return
                     (default: 10000)
  --timeout TIMEOUT  Maximum seconds to wait for query to complete (default:
                     1800)


search go

protein-quest search go --help

Usage: protein-quest search go [-h]
                               [--aspect 
{molecular_function,biological_process,cellular_component}]
                               [--limit LIMIT]
                               term output_csv

Search for Gene Ontology (GO) terms in the EBI QuickGO API.

Positional Arguments:
  term                  GO term to search for. For example `apoptosome`.
  output_csv            Output CSV with GO term results. Use `-` for stdout.

Options:
  -h, --help            show this help message and exit
  --aspect {molecular_function,biological_process,cellular_component}
                        Filter on aspect. (default: None)
  --limit LIMIT         Maximum number of GO term results to return (default:
                        100)


retrieve

protein-quest retrieve --help

Usage: protein-quest retrieve [-h] {pdbe,alphafold} ...

Retrieve structure files from online resources.

Positional Arguments:
  {pdbe,alphafold}
    pdbe            Retrieve PDBe gzipped mmCIF files for PDB IDs in CSV.
    alphafold       Retrieve AlphaFold files for IDs in CSV

Options:
  -h, --help        show this help message and exit


retrieve pdbe

protein-quest retrieve pdbe --help

Usage: protein-quest retrieve pdbe [-h]
                                   [--max-parallel-downloads 
MAX_PARALLEL_DOWNLOADS]
                                   pdbe_csv output_dir

Retrieve mmCIF files from Protein Data Bank in Europe Knowledge Base (PDBe)
website for unique PDB IDs listed in a CSV file.

Positional Arguments:
  pdbe_csv              CSV file with `pdb_id` column. Other columns are
                        ignored. Use `-` for stdin.
  output_dir            Directory to store downloaded PDBe mmCIF files

Options:
  -h, --help            show this help message and exit
  --max-parallel-downloads MAX_PARALLEL_DOWNLOADS
                        Maximum number of parallel downloads (default: 5)


retrieve alphafold

protein-quest retrieve alphafold --help

Usage: protein-quest retrieve alphafold [-h]
                                        [--what-af-formats 
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,paeDoc,paeImage,pdb}
]
                                        [--max-parallel-downloads 
MAX_PARALLEL_DOWNLOADS]
                                        alphafold_csv output_dir

Retrieve AlphaFold files from the AlphaFold Protein Structure Database.

Positional Arguments:
  alphafold_csv         CSV file with `af_id` column. Other columns are
                        ignored. Use `-` for stdin.
  output_dir            Directory to store downloaded AlphaFold files

Options:
  -h, --help            show this help message and exit
  --what-af-formats 
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,paeDoc,paeImage,pdb}
                        AlphaFold formats to retrieve. Can be specified
                        multiple times. Default is 'pdb'. Summary is always
                        downloaded as `<entryId>.json`. (default: None)
  --max-parallel-downloads MAX_PARALLEL_DOWNLOADS
                        Maximum number of parallel downloads (default: 5)


filter

protein-quest filter --help

Usage: protein-quest filter [-h] {confidence,chain,residue} ...

Positional Arguments:
  {confidence,chain,residue}
    confidence          Filter AlphaFold mmcif/PDB files by confidence
    chain               Filter on chain.
    residue             Filter PDB/mmCIF files by number of residues in chain
                        A

Options:
  -h, --help            show this help message and exit


filter confidence

protein-quest filter confidence --help

Usage: protein-quest filter confidence [-h]
                                       [--confidence-threshold 
CONFIDENCE_THRESHOLD]
                                       [--min-residues MIN_RESIDUES]
                                       [--max-residues MAX_RESIDUES]
                                       [--write-stats WRITE_STATS]
                                       input_dir output_dir

Filter AlphaFold mmcif/PDB files by confidence (plDDT). Passed files are
written with residues below threshold removed.

Positional Arguments:
  input_dir             Directory with AlphaFold mmcif/PDB files
  output_dir            Directory to write filtered mmcif/PDB files

Options:
  -h, --help            show this help message and exit
  --confidence-threshold CONFIDENCE_THRESHOLD
                        pLDDT confidence threshold (0-100) (default: 70)
  --min-residues MIN_RESIDUES
                        Minimum number of high-confidence residues a structure
                        should have (default: 0)
  --max-residues MAX_RESIDUES
                        Maximum number of high-confidence residues a structure
                        should have (default: 10000000)
  --write-stats WRITE_STATS
                        Write filter statistics to file. In CSV format with
                        `<input_file>,<residue_count>,<passed>,<output_file>`
                        columns. Use `-` for stdout. (default: None)


filter chain

protein-quest filter chain --help

Usage: protein-quest filter chain [-h] [--scheduler-address SCHEDULER_ADDRESS]
                                  chains input_dir output_dir

For each input PDB/mmCIF and chain combination write a PDB/mmCIF file with
just the given chain and rename it to chain `A`. Filtering is done in parallel
using a Dask cluster.

Positional Arguments:
  chains                CSV file with `pdb_id` and `chain` columns. Other
                        columns are ignored.
  input_dir             Directory with PDB/mmCIF files. Expected filenames are
                        `{pdb_id}.cif.gz`, `{pdb_id}.cif`, `{pdb_id}.pdb.gz`
                        or `{pdb_id}.pdb`.
  output_dir            Directory to write the single-chain PDB/mmCIF files.
                        Output files are in same format as input files.

Options:
  -h, --help            show this help message and exit
  --scheduler-address SCHEDULER_ADDRESS
                        Address of the Dask scheduler to connect to. If not
                        provided, will create a local cluster. (default: None)


filter residue

protein-quest filter residue --help

Usage: protein-quest filter residue [-h] [--min-residues MIN_RESIDUES]
                                    [--max-residues MAX_RESIDUES]
                                    [--write-stats WRITE_STATS]
                                    input_dir output_dir

Filter PDB/mmCIF files by number of residues in chain A.

Positional Arguments:
  input_dir             Directory with PDB/mmCIF files (e.g., from 'filter
                        chain')
  output_dir            Directory to write filtered PDB/mmCIF files. Files are
                        copied without modification.

Options:
  -h, --help            show this help message and exit
  --min-residues MIN_RESIDUES
                        Min residues in chain A (default: 0)
  --max-residues MAX_RESIDUES
                        Max residues in chain A (default: 10000000)
  --write-stats WRITE_STATS
                        Write filter statistics to file. In CSV format with
                        `<input_file>,<residue_count>,<passed>,<output_file>`
                        columns. Use `-` for stdout. (default: None)


mcp

protein-quest mcp --help

Usage: protein-quest mcp [-h] [--transport {stdio,http,streamable-http}]
                         [--host HOST] [--port PORT]

Run Model Context Protocol (MCP) server. Can be used by agentic LLMs like
Claude Sonnet 4 as a set of tools.

Options:
  -h, --help            show this help message and exit
  --transport {stdio,http,streamable-http}
                        Transport protocol to use (default: stdio)
  --host HOST           Host to bind the server to (default: 127.0.0.1)
  --port PORT           Port to bind the server to (default: 8000)