Skip to content

CLI Reference

Documentation for the protein-detective script.

protein-detective --help


Usage: protein-detective [-h]
                         [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                         [--version]
                         {search,retrieve,filter,import-structures,powerfit} ...

Protein Detective CLI

Positional Arguments:
  {search,retrieve,filter,import-structures,powerfit}
    search              Search UniProt for structures
    retrieve            Retrieve structures
    filter              Filter structures
    import-structures   Import structures from a directory into the session
    powerfit            PowerFit related commands

Options:
  -h, --help            show this help message and exit
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
  --version             show program's version number and exit


protein-detective search --help

Usage: protein-detective search [-h] [--taxon-id TAXON_ID]
                                [--reviewed | --no-reviewed]
                                [--subcellular-location-uniprot 
SUBCELLULAR_LOCATION_UNIPROT]
                                [--subcellular-location-go 
SUBCELLULAR_LOCATION_GO]
                                [--molecular-function-go MOLECULAR_FUNCTION_GO]
                                [--min-sequence-length MIN_SEQUENCE_LENGTH]
                                [--max-sequence-length MAX_SEQUENCE_LENGTH]
                                [--interaction-partner-seed 
INTERACTION_PARTNER_SEED]
                                [--interaction-partner-exclude 
INTERACTION_PARTNER_EXCLUDE]
                                [--min-residues MIN_RESIDUES]
                                [--max-residues MAX_RESIDUES] [--limit LIMIT]
                                session_dir

Positional Arguments:
  session_dir           Session directory to store results

Options:
  -h, --help            show this help message and exit
  --taxon-id TAXON_ID   NCBI Taxon ID
  --reviewed, --no-reviewed
                        Reviewed=swissprot, no-reviewed=trembl. Default is
                        uniprot=swissprot+trembl.
  --subcellular-location-uniprot SUBCELLULAR_LOCATION_UNIPROT
                        Subcellular location (UniProt)
  --subcellular-location-go SUBCELLULAR_LOCATION_GO
                        Subcellular location (GO term, e.g. GO:0005737). Can
                        be specified multiple times.
  --molecular-function-go MOLECULAR_FUNCTION_GO
                        Molecular function (GO term, e.g. GO:0003677). Can be
                        specified multiple times.
  --min-sequence-length MIN_SEQUENCE_LENGTH
                        Minimum length of the canonical sequence.
  --max-sequence-length MAX_SEQUENCE_LENGTH
                        Maximum length of the canonical sequence.
  --interaction-partner-seed INTERACTION_PARTNER_SEED
                        UniProt ID to use as interaction partner seed. The
                        search will be expanded to include structures
                        identifiers of the found interaction partners. Can be
                        specified multiple times.
  --interaction-partner-exclude INTERACTION_PARTNER_EXCLUDE
                        UniProt ID to exclude as found interaction partners.
                        Can be specified multiple times.
  --min-residues MIN_RESIDUES
                        Minimum number of residues required in the chain
                        mapped to the UniProt accession.
  --max-residues MAX_RESIDUES
                        Maximum number of residues allowed in chain mapped to
                        the UniProt accession.
  --limit LIMIT         Limit number of results


retrieve

protein-detective retrieve --help

Usage: protein-detective retrieve [-h] [--what {alphafold,pdbe}]
                                  [--what-af-formats 
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msa,paeDoc,pdb,plddt
Doc,summary}]
                                  session_dir

Positional Arguments:
  session_dir           Session directory to store results

Options:
  -h, --help            show this help message and exit
  --what {alphafold,pdbe}
                        What to retrieve. Can be specified multiple times.
                        Default is pdbe and alphafold.
  --what-af-formats 
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msa,paeDoc,pdb,plddt
Doc,summary}
                        AlphaFold formats to retrieve. Can be specified
                        multiple times. Default is 'cif'.


filter

protein-detective filter --help

Usage: protein-detective filter [-h]
                                [--confidence-threshold CONFIDENCE_THRESHOLD]
                                [--min-residues MIN_RESIDUES]
                                [--max-residues MAX_RESIDUES]
                                [--abs-min-helix-residues 
ABS_MIN_HELIX_RESIDUES]
                                [--abs-max-helix-residues 
ABS_MAX_HELIX_RESIDUES]
                                [--abs-min-sheet-residues 
ABS_MIN_SHEET_RESIDUES]
                                [--abs-max-sheet-residues 
ABS_MAX_SHEET_RESIDUES]
                                [--ratio-min-helix-residues 
RATIO_MIN_HELIX_RESIDUES]
                                [--ratio-max-helix-residues 
RATIO_MAX_HELIX_RESIDUES]
                                [--ratio-min-sheet-residues 
RATIO_MIN_SHEET_RESIDUES]
                                [--ratio-max-sheet-residues 
RATIO_MAX_SHEET_RESIDUES]
                                [--scheduler-address SCHEDULER_ADDRESS]
                                session_dir

Filter structures based on

- For PDBe structures the chain of Uniprot protein is written as chain A.
- For AlphaFold structures filter by confidence (pLDDT) threshold
- Number of residues in chain A
  For AlphaFold structures writes new files with low confidence residues (below 
threshold) removed
- Number of residues in secondary structure (helices and sheets)
- For determining the fraction or number of Secondary Structure elements see the
following notebook: 
https://www.bonvinlab.org/protein-detective/SSE_elements.html

Positional Arguments:
  session_dir           Session directory to store results

Options:
  -h, --help            show this help message and exit
  --confidence-threshold CONFIDENCE_THRESHOLD
                        pLDDT confidence threshold (0-100) for AlphaFold
                        structures. Default is 70.0.
  --min-residues MIN_RESIDUES
                        Minimum number of residues in chain A
  --max-residues MAX_RESIDUES
                        Maximum number of residues in chain A
  --abs-min-helix-residues ABS_MIN_HELIX_RESIDUES
                        Minimum number residues in helices
  --abs-max-helix-residues ABS_MAX_HELIX_RESIDUES
                        Maximum number residues in helices
  --abs-min-sheet-residues ABS_MIN_SHEET_RESIDUES
                        Minimum number residues in sheets
  --abs-max-sheet-residues ABS_MAX_SHEET_RESIDUES
                        Maximum number residues in sheets
  --ratio-min-helix-residues RATIO_MIN_HELIX_RESIDUES
                        Minimum number residues in helices (fraction of total)
  --ratio-max-helix-residues RATIO_MAX_HELIX_RESIDUES
                        Maximum number residues in helices (fraction of total)
  --ratio-min-sheet-residues RATIO_MIN_SHEET_RESIDUES
                        Minimum number residues in sheets (fraction of total)
  --ratio-max-sheet-residues RATIO_MAX_SHEET_RESIDUES
                        Maximum number residues in sheets (fraction of total)
  --scheduler-address SCHEDULER_ADDRESS
                        Address of the Dask scheduler to connect to. If not
                        provided, will create a local cluster.


import-structures

protein-detective import-structures --help

Usage: protein-detective import-structures [-h]
                                           [--copy-method 
{symlink,copy,hardlink}]
                                           [--strict]
                                           structures_dir session_dir

Import structures from a directory into the session.

The directory should contain structure files in PDB or mmCIF format.

This can be used to import structures obtained from other sources,
or to re-import structures after filtering with external tools.

Positional Arguments:
  structures_dir        Directory containing structure files to import
  session_dir           Session directory to store results

Options:
  -h, --help            show this help message and exit
  --copy-method {symlink,copy,hardlink}
                        Method to use for importing files. Default is
                        'hardlink'. If 'copy', files will be copied. If
                        'symlink', symbolic links will be created. If
                        'hardlink', hard links will be created (unavailable on
                        Windows).
  --strict              Raise an error if structure files do not meet expected
                        criteria (single chain A, single UniProt accession).
                        Without this flag, files that do not meet these
                        criteria are skipped with a warning.


powerfit

protein-detective powerfit --help

Usage: protein-detective powerfit [-h]
                                  {commands,run,report,fit-models,list-runs,list
-lcc} ...

Positional Arguments:
  {commands,run,report,fit-models,list-runs,list-lcc}
    commands            Generate PowerFit commands for PDB files in the
                        session directory
    run                 Run PowerFit on PDB files in the session directory
    report              Generate a report of the best PowerFit solutions.
    fit-models          Fit models based on PowerFit solutions
    list-runs           List all PowerFit runs in the session directory
    list-lcc            List Local Cross Validation (lcc.mrc) files for
                        PowerFit runs

Options:
  -h, --help            show this help message and exit


powerfit commands

protein-detective powerfit commands --help

Usage: protein-detective powerfit commands [-h] [-a <float>] [-nl] [-ncw]
                                           [-nr] [-rr <float>] [-nt]
                                           [-tc <float>] [-p <int>]
                                           [--batch-size <int>] [-g [GPU]]
                                           [--gpu-backend {opencl,cuda}]
                                           [--output OUTPUT]
                                           target resolution session_dir

Positional Arguments:
  target                Target density map to fit the model in. Data should
                        either be in CCP4 or MRC format
  resolution            Resolution of map in angstrom
  session_dir           Session directory for input and output

Options:
  -h, --help            show this help message and exit
  -a, --angle <float>   Rotational sampling density in degree. Increasing this
                        number by a factor of 2 results in approximately 8
                        times more rotations sampled.
  -nl, --no-laplace     Do not use the Laplace pre-filter density data.
  -ncw, --no-core-weighted
                        Do not use core-weighted local cross-correlation
                        score.
  -nr, --no-resampling  Do not resample the density map.
  -rr, --resampling-rate <float>
                        Resampling rate compared to Nyquist.
  -nt, --no-trimming    Do not trim the density map.
  -tc, --trimming-cutoff <float>
                        Intensity cutoff to which the map will be trimmed.
                        Default is 10 percent of maximum intensity.
  -p, --nproc <int>     Number of processors used during search. The number
                        will be capped at the total number of available
                        processors on your machine.
  --batch-size <int>    GPU batch size to use. Use 0 to disable batching
                        entirely, or a positive integer to force a specific
                        batch size. Applies to GPU backends (CUDA/OpenCL). If
                        set too high will cause out-of-memory errors.
  -g, --gpu [GPU]       Off-load the intensive calculations to the GPU.
                        Optionally specify number of workers per GPU (default:
                        1).
  --gpu-backend {opencl,cuda}
                        GPU backend to target when generating PowerFit
                        commands.
  --output OUTPUT       Output file for powerfit commands. If set to '-'
                        (default) will print to stdout.


powerfit run

protein-detective powerfit run --help

Usage: protein-detective powerfit run [-h] [-a <float>] [-nl] [-ncw] [-nr]
                                      [-rr <float>] [-nt] [-tc <float>]
                                      [-p <int>] [--batch-size <int>]
                                      [-g [GPU]] [--gpu-backend {opencl,cuda}]
                                      [--scheduler-address SCHEDULER_ADDRESS]
                                      target resolution session_dir

Run PowerFit on PDB files in the session directory and store results.

Positional Arguments:
  target                Target density map to fit the model in. Data should
                        either be in CCP4 or MRC format
  resolution            Resolution of map in angstrom
  session_dir           Session directory containing PDB files

Options:
  -h, --help            show this help message and exit
  -a, --angle <float>   Rotational sampling density in degree. Increasing this
                        number by a factor of 2 results in approximately 8
                        times more rotations sampled.
  -nl, --no-laplace     Do not use the Laplace pre-filter density data.
  -ncw, --no-core-weighted
                        Do not use core-weighted local cross-correlation
                        score.
  -nr, --no-resampling  Do not resample the density map.
  -rr, --resampling-rate <float>
                        Resampling rate compared to Nyquist.
  -nt, --no-trimming    Do not trim the density map.
  -tc, --trimming-cutoff <float>
                        Intensity cutoff to which the map will be trimmed.
                        Default is 10 percent of maximum intensity.
  -p, --nproc <int>     Number of processors used during search. The number
                        will be capped at the total number of available
                        processors on your machine.
  --batch-size <int>    GPU batch size to use. Use 0 to disable batching
                        entirely, or a positive integer to force a specific
                        batch size. Applies to GPU backends (CUDA/OpenCL). If
                        set too high will cause out-of-memory errors.
  -g, --gpu [GPU]       Off-load the intensive calculations to the GPU.
                        Optionally specify number of workers per GPU (default:
                        1).
  --gpu-backend {opencl,cuda}
                        GPU backend to target when running PowerFit.
  --scheduler-address SCHEDULER_ADDRESS
                        Address of the Dask scheduler to connect to. If not
                        provided, will create a local cluster.


powerfit report

protein-detective powerfit report --help

Usage: protein-detective powerfit report [-h]
                                         [--powerfit_run_id POWERFIT_RUN_ID]
                                         [--top TOP] [--output OUTPUT]
                                         session_dir

Positional Arguments:
  session_dir           Session directory containing PowerFit results

Options:
  -h, --help            show this help message and exit
  --powerfit_run_id POWERFIT_RUN_ID
                        ID of the PowerFit run to report on
  --top TOP             Number of top solutions to report
  --output OUTPUT       Output file for solutions table. If set to '-'
                        (default) will print to stdout.


powerfit fit-models

protein-detective powerfit fit-models --help

Usage: protein-detective powerfit fit-models [-h]
                                             [--powerfit_run_id POWERFIT_RUN_ID]
                                             [--top TOP] [--output OUTPUT]
                                             session_dir

Positional Arguments:
  session_dir           Session directory containing PowerFit results

Options:
  -h, --help            show this help message and exit
  --powerfit_run_id POWERFIT_RUN_ID
                        ID of the PowerFit run to report on. If not provided,
                        will use the all runs.
  --top TOP             Number of top solutions to fit models for
  --output OUTPUT       Output file for fitted model table. If set to '-'
                        (default) will print to stdout.


powerfit list-runs

protein-detective powerfit list-runs --help

Usage: protein-detective powerfit list-runs [-h] session_dir

Positional Arguments:
  session_dir  Session directory containing PowerFit results

Options:
  -h, --help   show this help message and exit


powerfit list-lcc

protein-detective powerfit list-lcc --help

Usage: protein-detective powerfit list-lcc [-h] session_dir

Positional Arguments:
  session_dir  Session directory containing PowerFit results

Options:
  -h, --help   show this help message and exit