Skip to content

CLI Reference

This section documents all available CLI commands.

protein-quest COMMAND

Protein Quest CLI

Table of Contents

Commands:

protein-quest --install-completion

protein-quest --install-completion [OPTIONS]

Install shell completion for this application.

This command generates and installs the completion script to the appropriate location for your shell. After installation, you may need to restart your shell or source your shell configuration file.

Parameters:

  • --shell: Shell type for completion. If not specified, attempts to auto-detect current shell. [choices: zsh, bash, fish]
  • --output, -o: Output path for the completion script. If not specified, uses shell-specific default.

Search data sources

protein-quest search uniprot

protein-quest search uniprot [OPTIONS] OUTPUT

Search for UniProt accessions.

Search for UniProt accessions based on various criteria in the Uniprot SPARQL endpoint.

Arguments:

  • OUTPUT: Output text file for UniProt accessions (one per line). Use - for stdout. [required]

Parameters:

  • --taxon-id: NCBI Taxon ID to filter results by organism (for example 9606 for human).
  • --reviewed, --no-reviewed: Whether to filter results by reviewed status (True for reviewed, False for unreviewed).
  • --subcellular-location-uniprot: Subcellular location in UniProt format (for example "nucleus").
  • --subcellular-location-go: Subcellular location in GO format. Can be a single GO term (for example, ["GO:0005634"]) or a collection of GO terms (for example, ["GO:0005634", "GO:0005737"]), which are searched with OR logic.
  • --molecular-function-go: Molecular function in GO format. Can be a single GO term (for example, ["GO:0003674"]) or a collection of GO terms (for example, ["GO:0003674", "GO:0008150"]), which are searched with OR logic.
  • --min-sequence-length: Minimum length of the canonical sequence.
  • --max-sequence-length: Maximum length of the canonical sequence.
  • --limit: Maximum number of uniprot accessions to return. [default: 10000]
  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search pdbe

protein-quest search pdbe [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV

Search for PDB structures of given UniProt accessions.

Search for PDB structures of given UniProt accessions in the Uniprot SPARQL endpoint.

Arguments:

  • UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use - for stdin. [required]
  • OUTPUT_CSV: Output CSV with following columns: uniprot_accession, pdb_id, method, resolution, uniprot_chains, chain, chain_length. Where uniprot_chains is the raw UniProt chain string, for example A=1-100. And where chain is the first chain from uniprot_chains, for example A. And chain_length is the length of the chain, for example 100 or '' if it could not be determined. Use - for stdout. [required]

Parameters:

  • --limit: Maximum number of PDB uniprot accessions combinations to return. [default: 10000]
  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]
  • --min-residues: Minimum number of residues required in the chain mapped to the UniProt accession.
  • --max-residues: Maximum number of residues allowed in chain mapped to the UniProt accession.
  • --keep-invalid: Keep PDB results when chain length could not be determined. [default: False]
  • --top-resolution-per-uniprot-accession: Retain the top N PDB entries per UniProt accession, ranked by best (lowest) resolution first, then by highest residue count. For example use --top-resolution-per-uniprot-accession 3 to keep only the best 3 PDB entries per UniProt accession.

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search alphafold

protein-quest search alphafold [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV

Search for AlphaFold structures of given UniProt accessions.

Search for AlphaFold structures of given UniProt accessions in the Uniprot SPARQL endpoint.

Arguments:

  • UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use - for stdin. [required]
  • OUTPUT_CSV: Output CSV with AlphaFold IDs per UniProt accession. CSV has columns: uniprot_accession, af_id. Use - for stdout. [required]

Parameters:

  • --min-sequence-length: Minimum length of the canonical sequence.
  • --max-sequence-length: Maximum length of the canonical sequence.
  • --limit: Maximum number of Alphafold entry identifiers to return. [default: 10000]
  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search structure

protein-quest search structure [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV

Search for experimentally determined and predicted structures.

Search for experimentally determined and predicted structures of given UniProt accessions in the 3D Beacons Network API.

Arguments:

  • UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use - for stdin. [required]
  • OUTPUT_CSV: Output CSV with following columns: uniprot_accession, provider, model_identifier, model_url, model_format, chain, residue_count. Use - for stdout. [required]

Parameters:

  • --source: Source of the structures to search for. Default pdbe and alphafold. Multiple sources can be given by repeating the --source parameter. Use 'all' to search all sources. [choices: pdbe, ped, swissmodel, alphafold, sasbdb, alphafill, hegelab, modelarchive, isoformio, levylab, all]
  • --min-residues: Minimum number of residues required in the chain mapped to the UniProt accession.
  • --max-residues: Maximum number of residues allowed in the chain mapped to the UniProt accession.
  • --limit: Maximum number of structures per uniprot accession per source to return. [default: 10000]
  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]
  • --raw: Path to write raw 3D beacon summaries as JSON.

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search emdb

protein-quest search emdb [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV

Search for EMDB identifiers of given UniProt accessions.

Search for Electron Microscopy Data Bank (EMDB) identifiers of given UniProt accessions in the Uniprot SPARQL endpoint.

Arguments:

  • UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use - for stdin. [required]
  • OUTPUT_CSV: Output CSV with EMDB IDs per UniProt accession. CSV has columns: uniprot_accession, emdb_id. Use - for stdout. [required]

Parameters:

  • --limit: Maximum number of EMDB entry identifiers to return. [default: 10000]
  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search go

protein-quest search go [OPTIONS] TERM OUTPUT_CSV

Search for Gene Ontology (GO) terms.

Search for Gene Ontology (GO) terms in the EBI QuickGO API.

Arguments:

  • TERM: GO term to search for. For example apoptosome. [required]
  • OUTPUT_CSV: Output CSV with GO term results. CSV has columns: term, id, name, aspect, definition. Use - for stdout. [required]

Parameters:

  • --aspect: Filter on aspect. [choices: cellular_component, biological_process, molecular_function]
  • --limit: Maximum number of GO term results to return. [default: 100]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search taxonomy

protein-quest search taxonomy [OPTIONS] QUERY OUTPUT_CSV

Search for taxon information in UniProt.

Search for taxon information in UniProt. Uses https://www.uniprot.org/taxonomy?query=*.

Arguments:

  • QUERY: Search query for the taxon. Surround multiple words with quotes. [required]
  • OUTPUT_CSV: Output CSV with taxonomy results. CSV has columns: tax_id, name, rank, parent_tax_id, parent_tax_name. Use - for stdout. [required]

Parameters:

  • --field: Field to search in. If not given then searches all fields. If "tax_id" then searches by taxon ID. If "parent" then given a parent taxon ID returns all its children. [choices: tax_id, scientific, common, parent]
  • --limit: Maximum number of results to return. [default: 100]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search interaction-partners

protein-quest search interaction-partners [OPTIONS] UNIPROT_ACCESSION OUTPUT_CSV

Search for interaction partners of given UniProt accession.

Search for interaction partners of given UniProt accession in the Uniprot SPARQL endpoint and Complex Portal.

Arguments:

  • UNIPROT_ACCESSION: UniProt accession (for example P12345). [required]
  • OUTPUT_CSV: Output CSV with interaction partners per UniProt accession. CSV has columns: uniprot_accession. Use - for stdout. [required]

Parameters:

  • --exclude: UniProt accessions to exclude from the results. Multiple accessions can be given by repeating the --exclude option.
  • --limit: Maximum number of interaction partner uniprot accessions to return. [default: 10000]
  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search complexes

protein-quest search complexes [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV

Search for complexes in the Complex Portal.

Search for complexes in the Complex Portal (https://www.ebi.ac.uk/complexportal/).

The output CSV file has the following columns:

  • query_protein: UniProt accession used as query
  • complex_id: Complex Portal identifier
  • complex_url: URL to the Complex Portal entry
  • complex_title: Title of the complex
  • members: Semicolon-separated list of UniProt accessions of complex members

Arguments:

  • UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line) as query. Use - for stdin. [required]
  • OUTPUT_CSV: Output CSV file with complex results. Use - for stdout. [required]

Parameters:

  • --limit: Maximum number of complex results to return. [default: 100]
  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest search uniprot-details

protein-quest search uniprot-details [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV

Search for UniProt details for given UniProt accessions from the UniProt SPARQL endpoint.

The output CSV file has the following columns:

  • uniprot_accession: UniProt accession.
  • uniprot_id: UniProt ID (mnemonic).
  • sequence_length: Length of the canonical sequence.
  • reviewed: Whether the entry is reviewed (Swiss-Prot) or unreviewed (TrEMBL).
  • protein_name: Recommended protein name.
  • taxon_id: NCBI Taxonomy ID of the organism.
  • taxon_name: Scientific name of the organism.

The order of the output CSV can be different from the input order.

Arguments:

  • UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use - for stdin. [required]
  • OUTPUT_CSV: Output CSV with UniProt details. CSV has columns: uniprot_accession, uniprot_id, sequence_length, reviewed, protein_name, taxon_id, taxon_name. Use - for stdout. [required]

Parameters:

  • --timeout: Maximum seconds to wait for query to complete. [default: 1800]
  • --batch-size: Number of accessions to query per batch. [default: 1000]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest retrieve

Retrieve structure files

protein-quest retrieve pdbe

protein-quest retrieve pdbe [OPTIONS] PDBE_CSV OUTPUT_DIR

Retrieve mmCIF files from PDBe for PDB IDs in CSV.

Retrieve mmCIF files from Protein Data Bank in Europe Knowledge Base (PDBe) website for unique PDB IDs listed in a CSV file.

Arguments:

  • PDBE_CSV: CSV file with a pdb_id column, or with model_provider and model_identifier columns. When using model_provider, only rows with model_provider == 'pdbe' are used. Single-column CSV files are also accepted, and the first row is treated as an ID. Use - for stdin. [required]
  • OUTPUT_DIR: Directory to store downloaded PDBe mmCIF files. [required]

Parameters:

  • --max-parallel-downloads: Maximum number of parallel downloads. [default: 5]

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest retrieve alphafold

protein-quest retrieve alphafold [OPTIONS] ALPHAFOLD_CSV OUTPUT_DIR

Retrieve AlphaFold files for IDs in CSV.

Retrieve AlphaFold files from the AlphaFold Protein Structure Database.

Arguments:

  • ALPHAFOLD_CSV: CSV file with an af_id column, or with model_provider and model_identifier columns. When using model_provider, only rows with model_provider == 'alphafold' are used. Single-column CSV files are also accepted, and the first row is treated as an ID. Use - for stdin. [required]
  • OUTPUT_DIR: Directory to store downloaded AlphaFold files. [required]

Parameters:

  • --format: Formats to retrieve. Defaults to [cif]. Repeat parameter for multiple formats, for example --format cif --format pdb. [choices: summary, bcif, cif, pdb, paeDoc, amAnnotations, amAnnotationsHg19, amAnnotationsHg38, msa, plddtDoc]
  • --db-version: AlphaFold database version.
  • --gzip-files: Gzip downloaded files. [default: False]
  • --all-isoforms: Return all isoforms. [default: False]
  • --max-parallel-downloads: Maximum number of parallel downloads. [default: 5]

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest retrieve emdb

protein-quest retrieve emdb [OPTIONS] EMDB_CSV OUTPUT_DIR

Retrieve EMDB volume files for EMDB IDs in CSV.

Retrieve volume files from Electron Microscopy Data Bank (EMDB) website for unique EMDB IDs listed in a CSV file.

Arguments:

  • EMDB_CSV: CSV file with emdb_id column. Other columns are ignored. Single-column CSV files are also accepted, and the first row is treated as an ID. Use - for stdin. [required]
  • OUTPUT_DIR: Directory to store downloaded EMDB volume files. [required]

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest retrieve structure

protein-quest retrieve structure [OPTIONS] STRUCTURES_CSV OUTPUT_DIR

Retrieve structure files from search structure CSV output.

Retrieve structure files from model URLs listed in search structure CSV output.

Arguments:

  • STRUCTURES_CSV: CSV file with provider, model_identifier, model_url, and model_format columns. Use - for stdin. [required]
  • OUTPUT_DIR: Directory to store retrieved structure files. [required]

Parameters:

  • --raw: Download in native format from CSV. [default: False]
  • --max-parallel-downloads: Maximum number of parallel downloads. [default: 5]

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest filter

Filter files

protein-quest filter confidence

protein-quest filter confidence [OPTIONS] INPUT_DIR OUTPUT_DIR

Filter AlphaFold mmcif/PDB files by confidence (plDDT).

Filter AlphaFold mmcif/PDB files by confidence (plDDT). Passed files are written with residues below threshold removed.

Arguments:

  • INPUT_DIR: Directory with AlphaFold mmcif/PDB files. [required]
  • OUTPUT_DIR: Directory to write filtered mmcif/PDB files. [required]

Parameters:

  • --confidence: The confidence threshold for filtering residues. Residues with a pLDDT (b-factor) above this value are considered high confidence. [default: 70.0]
  • --min-residues: The minimum number of high-confidence residues required to keep the structure. [default: 0]
  • --max-residues: The maximum number of high-confidence residues required to keep the structure. [default: 10000000]
  • --write-stats: Write filter statistics to file. In CSV format with <input_file>,<residue_count>,<passed>,<output_file> columns. Use - for stdout.
  • --scheduler-address: Address of the Dask scheduler to connect to. If not provided, will create a local cluster. If set to sequential will run tasks sequentially.

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest filter chain

protein-quest filter chain [OPTIONS] CHAINS INPUT_DIR OUTPUT_DIR

Filter on chain.

For each input PDB/mmCIF and chain combination write a PDB/mmCIF file with just the given chain and rename it to chain A. Filtering is done in parallel using a Dask cluster.

Arguments:

  • CHAINS: CSV file with pdb_id and chain columns. Other columns are ignored. [required]
  • INPUT_DIR: Directory with PDB/mmCIF files. Expected filenames are {pdb_id}.cif.gz, {pdb_id}.cif, {pdb_id}.pdb.gz or {pdb_id}.pdb. [required]
  • OUTPUT_DIR: Directory to write the single-chain PDB/mmCIF files. Output files are in same format as input files. [required]

Parameters:

  • --scheduler-address: Address of the Dask scheduler to connect to. If not provided, will create a local cluster. If set to sequential will run tasks sequentially.

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest filter residue

protein-quest filter residue [OPTIONS] INPUT_DIR OUTPUT_DIR

Filter PDB/mmCIF files by number of residues in chain A.

Filter PDB/mmCIF files by number of residues in chain A.

Arguments:

  • INPUT_DIR: Directory with PDB/mmCIF files (for example from 'filter chain'). [required]
  • OUTPUT_DIR: Directory to write filtered PDB/mmCIF files. Files are copied without modification. [required]

Parameters:

  • --min-residues: Min residues in chain A. [default: 0]
  • --max-residues: Max residues in chain A. [default: 10000000]
  • --write-stats: Write filter statistics to file. In CSV format with <input_file>,<residue_count>,<passed>,<output_file> columns. Use - for stdout.

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest filter resolution

protein-quest filter resolution [OPTIONS] INPUT_DIR OUTPUT_DIR

Filter structure files by best resolution.

AlphaFold structures are preferred over non-AlphaFold. Structures with lower resolution are preferred. If resolution is the same, structures with more residues are preferred. If resolution is missing, those structures are undesirable.

Arguments:

  • INPUT_DIR: Directory structure files. [required]
  • OUTPUT_DIR: Directory to write the selected structure files. [required]

Parameters:

  • --group-by: Pass top-N structures with best resolution per uniprot accession. Structures without uniprot accession are never passed. Mutually exclusive with no_group_by. [choices: uniprot_accession] [default: uniprot_accession]
  • --no-group-by: Disable grouping and use global top-N ranking across all files. Mutually exclusive with group_by. [default: False]
  • --top: Maximum number of files to keep. [default: 1000]
  • --write-stats: Write filter statistics to file. In CSV format. For --group-by=uniprot_accession columns are: <input_file>,<uniprot_accession>,<resolution>,<total_residue_count>,<is_alphafold>,<passed>,<output_file>. For --no-group-by columns are: <input_file>,<resolution>,<total_residue_count>,<is_alphafold>,<passed>,<output_file>. Use - for stdout.

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest filter secondary-structure

protein-quest filter secondary-structure [OPTIONS] INPUT_DIR OUTPUT_DIR

Filter PDB/mmCIF files by secondary structure.

Filter PDB/mmCIF files by secondary structure.

Arguments:

  • INPUT_DIR: Directory with PDB/mmCIF files. [required]
  • OUTPUT_DIR: Directory to write filtered PDB/mmCIF files. Files are copied without modification. [required]

Parameters:

  • --abs-min-helix-residues: Minimum number of residues in helices (absolute).
  • --abs-max-helix-residues: Maximum number of residues in helices (absolute).
  • --abs-min-sheet-residues: Minimum number of residues in sheets (absolute).
  • --abs-max-sheet-residues: Maximum number of residues in sheets (absolute).
  • --ratio-min-helix-residues: Minimum helix residue ratio (fraction from 0 to 1).
  • --ratio-max-helix-residues: Maximum helix residue ratio (fraction from 0 to 1).
  • --ratio-min-sheet-residues: Minimum sheet residue ratio (fraction from 0 to 1).
  • --ratio-max-sheet-residues: Maximum sheet residue ratio (fraction from 0 to 1).
  • --write-stats: Write filter statistics to file. In CSV format with columns: <input_file>,<nr_residues>,<nr_helix_residues>,<nr_sheet_residues>, <helix_ratio>,<sheet_ratio>,<passed>,<output_file>. Use - for stdout.

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest convert

Convert files between formats

protein-quest convert uniprot

protein-quest convert uniprot [OPTIONS] INPUT_DIR OUTPUT

Convert structure files to list of UniProt accessions.

UniProt accessions are read from database reference of each structure.

Arguments:

  • INPUT_DIR: Directory with structure files. Supported extensions are .cif, .cif.gz, .pdb, .pdb.gz. [required]
  • OUTPUT: Output text file with UniProt accessions (one per line). Use '-' for stdout. [required]

Parameters:

  • --grouped: Whether to group accessions by structure file. If set output changes to <structure_file1>,<acc1>\n<structure_file1>,<acc2> format. [default: False]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest convert structures

protein-quest convert structures [OPTIONS] INPUT_DIR

Convert structure files between formats.

Convert structure files between formats.

Arguments:

  • INPUT_DIR: Directory with structure files. Supported extensions are .pdb, .pdb.gz, .ent, .ent.gz, .cif, .cif.gz, .bcif, .bcif.gz. [required]

Parameters:

  • --output-dir: Directory to write converted structure files. If not given, files are written to input_dir.
  • --output-format: Output format for converted files. Supported values are .cif and .cif.gz. [choices: .cif, .cif.gz] [default: .cif]

Cache:

  • --no-cache: Disable caching of files to central location. [default: False]
  • --cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]
  • --copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]

protein-quest mcp

protein-quest mcp [OPTIONS]

Run Model Context Protocol (MCP) server

Parameters:

  • --transport: Transport protocol to use. [choices: stdio, http, sse, streamable-http] [default: stdio]
  • --host: Host to bind the server to. [default: 127.0.0.1]
  • --port: Port to bind the server to. [default: 8000]

Common:

  • --verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]
  • --quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]
  • --prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]