CLI Reference
This section documents all available CLI commands.
protein-quest COMMAND
Protein Quest CLI
Table of Contents
Commands:
convert: Convert files between formatsfilter: Filter filesmcp: Run Model Context Protocol (MCP) serverretrieve: Retrieve structure filessearch: Search data sources--install-completion: Install shell completion for this application.
protein-quest --install-completion
protein-quest --install-completion [OPTIONS]
Install shell completion for this application.
This command generates and installs the completion script to the appropriate location for your shell. After installation, you may need to restart your shell or source your shell configuration file.
Parameters:
--shell: Shell type for completion. If not specified, attempts to auto-detect current shell. [choices: zsh, bash, fish]--output, -o: Output path for the completion script. If not specified, uses shell-specific default.
protein-quest search
Search data sources
protein-quest search uniprot
protein-quest search uniprot [OPTIONS] OUTPUT
Search for UniProt accessions.
Search for UniProt accessions based on various criteria in the Uniprot SPARQL endpoint.
Arguments:
OUTPUT: Output text file for UniProt accessions (one per line). Use-for stdout. [required]
Parameters:
--taxon-id: NCBI Taxon ID to filter results by organism (for example 9606 for human).--reviewed, --no-reviewed: Whether to filter results by reviewed status (True for reviewed, False for unreviewed).--subcellular-location-uniprot: Subcellular location in UniProt format (for example "nucleus").--subcellular-location-go: Subcellular location in GO format. Can be a single GO term (for example, ["GO:0005634"]) or a collection of GO terms (for example, ["GO:0005634", "GO:0005737"]), which are searched with OR logic.--molecular-function-go: Molecular function in GO format. Can be a single GO term (for example, ["GO:0003674"]) or a collection of GO terms (for example, ["GO:0003674", "GO:0008150"]), which are searched with OR logic.--min-sequence-length: Minimum length of the canonical sequence.--max-sequence-length: Maximum length of the canonical sequence.--limit: Maximum number of uniprot accessions to return. [default: 10000]--timeout: Maximum seconds to wait for query to complete. [default: 1800]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search pdbe
protein-quest search pdbe [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV
Search for PDB structures of given UniProt accessions.
Search for PDB structures of given UniProt accessions in the Uniprot SPARQL endpoint.
Arguments:
UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use-for stdin. [required]OUTPUT_CSV: Output CSV with following columns:uniprot_accession,pdb_id,method,resolution,uniprot_chains,chain,chain_length. Whereuniprot_chainsis the raw UniProt chain string, for exampleA=1-100. And wherechainis the first chain fromuniprot_chains, for exampleA. Andchain_lengthis the length of the chain, for example100or '' if it could not be determined. Use-for stdout. [required]
Parameters:
--limit: Maximum number of PDB uniprot accessions combinations to return. [default: 10000]--timeout: Maximum seconds to wait for query to complete. [default: 1800]--min-residues: Minimum number of residues required in the chain mapped to the UniProt accession.--max-residues: Maximum number of residues allowed in chain mapped to the UniProt accession.--keep-invalid: Keep PDB results when chain length could not be determined. [default: False]--top-resolution-per-uniprot-accession: Retain the top N PDB entries per UniProt accession, ranked by best (lowest) resolution first, then by highest residue count. For example use--top-resolution-per-uniprot-accession 3to keep only the best 3 PDB entries per UniProt accession.
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search alphafold
protein-quest search alphafold [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV
Search for AlphaFold structures of given UniProt accessions.
Search for AlphaFold structures of given UniProt accessions in the Uniprot SPARQL endpoint.
Arguments:
UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use-for stdin. [required]OUTPUT_CSV: Output CSV with AlphaFold IDs per UniProt accession. CSV has columns:uniprot_accession,af_id. Use-for stdout. [required]
Parameters:
--min-sequence-length: Minimum length of the canonical sequence.--max-sequence-length: Maximum length of the canonical sequence.--limit: Maximum number of Alphafold entry identifiers to return. [default: 10000]--timeout: Maximum seconds to wait for query to complete. [default: 1800]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search structure
protein-quest search structure [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV
Search for experimentally determined and predicted structures.
Search for experimentally determined and predicted structures of given UniProt accessions in the 3D Beacons Network API.
Arguments:
UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use-for stdin. [required]OUTPUT_CSV: Output CSV with following columns:uniprot_accession,provider,model_identifier,model_url,model_format,chain,residue_count. Use-for stdout. [required]
Parameters:
--source: Source of the structures to search for. Defaultpdbeandalphafold. Multiple sources can be given by repeating the--sourceparameter. Use 'all' to search all sources. [choices: pdbe, ped, swissmodel, alphafold, sasbdb, alphafill, hegelab, modelarchive, isoformio, levylab, all]--min-residues: Minimum number of residues required in the chain mapped to the UniProt accession.--max-residues: Maximum number of residues allowed in the chain mapped to the UniProt accession.--limit: Maximum number of structures per uniprot accession per source to return. [default: 10000]--timeout: Maximum seconds to wait for query to complete. [default: 1800]--raw: Path to write raw 3D beacon summaries as JSON.
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search emdb
protein-quest search emdb [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV
Search for EMDB identifiers of given UniProt accessions.
Search for Electron Microscopy Data Bank (EMDB) identifiers of given UniProt accessions in the Uniprot SPARQL endpoint.
Arguments:
UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use-for stdin. [required]OUTPUT_CSV: Output CSV with EMDB IDs per UniProt accession. CSV has columns:uniprot_accession,emdb_id. Use-for stdout. [required]
Parameters:
--limit: Maximum number of EMDB entry identifiers to return. [default: 10000]--timeout: Maximum seconds to wait for query to complete. [default: 1800]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search go
protein-quest search go [OPTIONS] TERM OUTPUT_CSV
Search for Gene Ontology (GO) terms.
Search for Gene Ontology (GO) terms in the EBI QuickGO API.
Arguments:
TERM: GO term to search for. For exampleapoptosome. [required]OUTPUT_CSV: Output CSV with GO term results. CSV has columns:term,id,name,aspect,definition. Use-for stdout. [required]
Parameters:
--aspect: Filter on aspect. [choices: cellular_component, biological_process, molecular_function]--limit: Maximum number of GO term results to return. [default: 100]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search taxonomy
protein-quest search taxonomy [OPTIONS] QUERY OUTPUT_CSV
Search for taxon information in UniProt.
Search for taxon information in UniProt. Uses https://www.uniprot.org/taxonomy?query=*.
Arguments:
QUERY: Search query for the taxon. Surround multiple words with quotes. [required]OUTPUT_CSV: Output CSV with taxonomy results. CSV has columns:tax_id,name,rank,parent_tax_id,parent_tax_name. Use-for stdout. [required]
Parameters:
--field: Field to search in. If not given then searches all fields. If "tax_id" then searches by taxon ID. If "parent" then given a parent taxon ID returns all its children. [choices: tax_id, scientific, common, parent]--limit: Maximum number of results to return. [default: 100]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search interaction-partners
protein-quest search interaction-partners [OPTIONS] UNIPROT_ACCESSION OUTPUT_CSV
Search for interaction partners of given UniProt accession.
Search for interaction partners of given UniProt accession in the Uniprot SPARQL endpoint and Complex Portal.
Arguments:
UNIPROT_ACCESSION: UniProt accession (for example P12345). [required]OUTPUT_CSV: Output CSV with interaction partners per UniProt accession. CSV has columns:uniprot_accession. Use-for stdout. [required]
Parameters:
--exclude: UniProt accessions to exclude from the results. Multiple accessions can be given by repeating the--excludeoption.--limit: Maximum number of interaction partner uniprot accessions to return. [default: 10000]--timeout: Maximum seconds to wait for query to complete. [default: 1800]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search complexes
protein-quest search complexes [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV
Search for complexes in the Complex Portal.
Search for complexes in the Complex Portal (https://www.ebi.ac.uk/complexportal/).
The output CSV file has the following columns:
- query_protein: UniProt accession used as query
- complex_id: Complex Portal identifier
- complex_url: URL to the Complex Portal entry
- complex_title: Title of the complex
- members: Semicolon-separated list of UniProt accessions of complex members
Arguments:
UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line) as query. Use-for stdin. [required]OUTPUT_CSV: Output CSV file with complex results. Use-for stdout. [required]
Parameters:
--limit: Maximum number of complex results to return. [default: 100]--timeout: Maximum seconds to wait for query to complete. [default: 1800]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest search uniprot-details
protein-quest search uniprot-details [OPTIONS] UNIPROT_ACCESSIONS OUTPUT_CSV
Search for UniProt details for given UniProt accessions from the UniProt SPARQL endpoint.
The output CSV file has the following columns:
- uniprot_accession: UniProt accession.
- uniprot_id: UniProt ID (mnemonic).
- sequence_length: Length of the canonical sequence.
- reviewed: Whether the entry is reviewed (Swiss-Prot) or unreviewed (TrEMBL).
- protein_name: Recommended protein name.
- taxon_id: NCBI Taxonomy ID of the organism.
- taxon_name: Scientific name of the organism.
The order of the output CSV can be different from the input order.
Arguments:
UNIPROT_ACCESSIONS: Text file with UniProt accessions (one per line). Use-for stdin. [required]OUTPUT_CSV: Output CSV with UniProt details. CSV has columns:uniprot_accession,uniprot_id,sequence_length,reviewed,protein_name,taxon_id,taxon_name. Use-for stdout. [required]
Parameters:
--timeout: Maximum seconds to wait for query to complete. [default: 1800]--batch-size: Number of accessions to query per batch. [default: 1000]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest retrieve
Retrieve structure files
protein-quest retrieve pdbe
protein-quest retrieve pdbe [OPTIONS] PDBE_CSV OUTPUT_DIR
Retrieve mmCIF files from PDBe for PDB IDs in CSV.
Retrieve mmCIF files from Protein Data Bank in Europe Knowledge Base (PDBe) website for unique PDB IDs listed in a CSV file.
Arguments:
PDBE_CSV: CSV file with apdb_idcolumn, or withmodel_providerandmodel_identifiercolumns. When usingmodel_provider, only rows withmodel_provider == 'pdbe'are used. Single-column CSV files are also accepted, and the first row is treated as an ID. Use-for stdin. [required]OUTPUT_DIR: Directory to store downloaded PDBe mmCIF files. [required]
Parameters:
--max-parallel-downloads: Maximum number of parallel downloads. [default: 5]
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest retrieve alphafold
protein-quest retrieve alphafold [OPTIONS] ALPHAFOLD_CSV OUTPUT_DIR
Retrieve AlphaFold files for IDs in CSV.
Retrieve AlphaFold files from the AlphaFold Protein Structure Database.
Arguments:
ALPHAFOLD_CSV: CSV file with anaf_idcolumn, or withmodel_providerandmodel_identifiercolumns. When usingmodel_provider, only rows withmodel_provider == 'alphafold'are used. Single-column CSV files are also accepted, and the first row is treated as an ID. Use-for stdin. [required]OUTPUT_DIR: Directory to store downloaded AlphaFold files. [required]
Parameters:
--format: Formats to retrieve. Defaults to [cif]. Repeat parameter for multiple formats, for example--format cif --format pdb. [choices: summary, bcif, cif, pdb, paeDoc, amAnnotations, amAnnotationsHg19, amAnnotationsHg38, msa, plddtDoc]--db-version: AlphaFold database version.--gzip-files: Gzip downloaded files. [default: False]--all-isoforms: Return all isoforms. [default: False]--max-parallel-downloads: Maximum number of parallel downloads. [default: 5]
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest retrieve emdb
protein-quest retrieve emdb [OPTIONS] EMDB_CSV OUTPUT_DIR
Retrieve EMDB volume files for EMDB IDs in CSV.
Retrieve volume files from Electron Microscopy Data Bank (EMDB) website for unique EMDB IDs listed in a CSV file.
Arguments:
EMDB_CSV: CSV file withemdb_idcolumn. Other columns are ignored. Single-column CSV files are also accepted, and the first row is treated as an ID. Use-for stdin. [required]OUTPUT_DIR: Directory to store downloaded EMDB volume files. [required]
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest retrieve structure
protein-quest retrieve structure [OPTIONS] STRUCTURES_CSV OUTPUT_DIR
Retrieve structure files from search structure CSV output.
Retrieve structure files from model URLs listed in search structure CSV output.
Arguments:
STRUCTURES_CSV: CSV file withprovider,model_identifier,model_url, andmodel_formatcolumns. Use-for stdin. [required]OUTPUT_DIR: Directory to store retrieved structure files. [required]
Parameters:
--raw: Download in native format from CSV. [default: False]--max-parallel-downloads: Maximum number of parallel downloads. [default: 5]
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest filter
Filter files
protein-quest filter confidence
protein-quest filter confidence [OPTIONS] INPUT_DIR OUTPUT_DIR
Filter AlphaFold mmcif/PDB files by confidence (plDDT).
Filter AlphaFold mmcif/PDB files by confidence (plDDT). Passed files are written with residues below threshold removed.
Arguments:
INPUT_DIR: Directory with AlphaFold mmcif/PDB files. [required]OUTPUT_DIR: Directory to write filtered mmcif/PDB files. [required]
Parameters:
--confidence: The confidence threshold for filtering residues. Residues with a pLDDT (b-factor) above this value are considered high confidence. [default: 70.0]--min-residues: The minimum number of high-confidence residues required to keep the structure. [default: 0]--max-residues: The maximum number of high-confidence residues required to keep the structure. [default: 10000000]--write-stats: Write filter statistics to file. In CSV format with<input_file>,<residue_count>,<passed>,<output_file>columns. Use-for stdout.--scheduler-address: Address of the Dask scheduler to connect to. If not provided, will create a local cluster. If set tosequentialwill run tasks sequentially.
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest filter chain
protein-quest filter chain [OPTIONS] CHAINS INPUT_DIR OUTPUT_DIR
Filter on chain.
For each input PDB/mmCIF and chain combination write a PDB/mmCIF file with just the given chain
and rename it to chain A. Filtering is done in parallel using a Dask cluster.
Arguments:
CHAINS: CSV file withpdb_idandchaincolumns. Other columns are ignored. [required]INPUT_DIR: Directory with PDB/mmCIF files. Expected filenames are{pdb_id}.cif.gz,{pdb_id}.cif,{pdb_id}.pdb.gzor{pdb_id}.pdb. [required]OUTPUT_DIR: Directory to write the single-chain PDB/mmCIF files. Output files are in same format as input files. [required]
Parameters:
--scheduler-address: Address of the Dask scheduler to connect to. If not provided, will create a local cluster. If set tosequentialwill run tasks sequentially.
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest filter residue
protein-quest filter residue [OPTIONS] INPUT_DIR OUTPUT_DIR
Filter PDB/mmCIF files by number of residues in chain A.
Filter PDB/mmCIF files by number of residues in chain A.
Arguments:
INPUT_DIR: Directory with PDB/mmCIF files (for example from 'filter chain'). [required]OUTPUT_DIR: Directory to write filtered PDB/mmCIF files. Files are copied without modification. [required]
Parameters:
--min-residues: Min residues in chain A. [default: 0]--max-residues: Max residues in chain A. [default: 10000000]--write-stats: Write filter statistics to file. In CSV format with<input_file>,<residue_count>,<passed>,<output_file>columns. Use-for stdout.
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest filter resolution
protein-quest filter resolution [OPTIONS] INPUT_DIR OUTPUT_DIR
Filter structure files by best resolution.
AlphaFold structures are preferred over non-AlphaFold. Structures with lower resolution are preferred. If resolution is the same, structures with more residues are preferred. If resolution is missing, those structures are undesirable.
Arguments:
INPUT_DIR: Directory structure files. [required]OUTPUT_DIR: Directory to write the selected structure files. [required]
Parameters:
--group-by: Pass top-N structures with best resolution per uniprot accession. Structures without uniprot accession are never passed. Mutually exclusive withno_group_by. [choices: uniprot_accession] [default: uniprot_accession]--no-group-by: Disable grouping and use global top-N ranking across all files. Mutually exclusive withgroup_by. [default: False]--top: Maximum number of files to keep. [default: 1000]--write-stats: Write filter statistics to file. In CSV format. For--group-by=uniprot_accessioncolumns are:<input_file>,<uniprot_accession>,<resolution>,<total_residue_count>,<is_alphafold>,<passed>,<output_file>. For--no-group-bycolumns are:<input_file>,<resolution>,<total_residue_count>,<is_alphafold>,<passed>,<output_file>. Use-for stdout.
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest filter secondary-structure
protein-quest filter secondary-structure [OPTIONS] INPUT_DIR OUTPUT_DIR
Filter PDB/mmCIF files by secondary structure.
Filter PDB/mmCIF files by secondary structure.
Arguments:
INPUT_DIR: Directory with PDB/mmCIF files. [required]OUTPUT_DIR: Directory to write filtered PDB/mmCIF files. Files are copied without modification. [required]
Parameters:
--abs-min-helix-residues: Minimum number of residues in helices (absolute).--abs-max-helix-residues: Maximum number of residues in helices (absolute).--abs-min-sheet-residues: Minimum number of residues in sheets (absolute).--abs-max-sheet-residues: Maximum number of residues in sheets (absolute).--ratio-min-helix-residues: Minimum helix residue ratio (fraction from 0 to 1).--ratio-max-helix-residues: Maximum helix residue ratio (fraction from 0 to 1).--ratio-min-sheet-residues: Minimum sheet residue ratio (fraction from 0 to 1).--ratio-max-sheet-residues: Maximum sheet residue ratio (fraction from 0 to 1).--write-stats: Write filter statistics to file. In CSV format with columns:<input_file>,<nr_residues>,<nr_helix_residues>,<nr_sheet_residues>, <helix_ratio>,<sheet_ratio>,<passed>,<output_file>. Use-for stdout.
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest convert
Convert files between formats
protein-quest convert uniprot
protein-quest convert uniprot [OPTIONS] INPUT_DIR OUTPUT
Convert structure files to list of UniProt accessions.
UniProt accessions are read from database reference of each structure.
Arguments:
INPUT_DIR: Directory with structure files. Supported extensions are .cif, .cif.gz, .pdb, .pdb.gz. [required]OUTPUT: Output text file with UniProt accessions (one per line). Use '-' for stdout. [required]
Parameters:
--grouped: Whether to group accessions by structure file. If set output changes to<structure_file1>,<acc1>\n<structure_file1>,<acc2>format. [default: False]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest convert structures
protein-quest convert structures [OPTIONS] INPUT_DIR
Convert structure files between formats.
Convert structure files between formats.
Arguments:
INPUT_DIR: Directory with structure files. Supported extensions are .pdb, .pdb.gz, .ent, .ent.gz, .cif, .cif.gz, .bcif, .bcif.gz. [required]
Parameters:
--output-dir: Directory to write converted structure files. If not given, files are written to input_dir.--output-format: Output format for converted files. Supported values are .cif and .cif.gz. [choices: .cif, .cif.gz] [default: .cif]
Cache:
--no-cache: Disable caching of files to central location. [default: False]--cache-dir: Directory to use as cache for files. [default: /home/runner/.cache/protein-quest]--copy-method: How to make target file be same file as source file. By default uses hardlinks to save disk space. Note that hardlinks only work within the same filesystem and are harder to track. If you want to track cached files easily then use 'symlink'. On Windows you need developer mode or admin privileges to create symlinks. [choices: copy, symlink, hardlink] [default: hardlink]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]
protein-quest mcp
protein-quest mcp [OPTIONS]
Run Model Context Protocol (MCP) server
Parameters:
--transport: Transport protocol to use. [choices: stdio, http, sse, streamable-http] [default: stdio]--host: Host to bind the server to. [default: 127.0.0.1]--port: Port to bind the server to. [default: 8000]
Common:
--verbose, -v: Increase verbosity (use multiple times for more detail). [default: 0]--quiet, -q: Decrease verbosity (use multiple times for less output). [default: 0]--prov: Whether to write provenance information about the command execution to ro-crate-metadata.json file. [default: False]