Skip to content

CLI Reference

Documentation for the protein-detective script.

protein-detective --help


Usage: protein-detective [-h]
                         [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                         [--version]
                         {search,retrieve,filter,powerfit} ...

Protein Detective CLI

Positional Arguments:
  {search,retrieve,filter,powerfit}
    search              Search UniProt for structures
    retrieve            Retrieve structures
    filter              Filter structures
    powerfit            PowerFit related commands

Options:
  -h, --help            show this help message and exit
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
  --version             show program's version number and exit


protein-detective search --help

Usage: protein-detective search [-h] [--taxon-id TAXON_ID]
                                [--reviewed | --no-reviewed]
                                [--subcellular-location-uniprot 
SUBCELLULAR_LOCATION_UNIPROT]
                                [--subcellular-location-go 
SUBCELLULAR_LOCATION_GO]
                                [--molecular-function-go MOLECULAR_FUNCTION_GO]
                                [--min-sequence-length MIN_SEQUENCE_LENGTH]
                                [--max-sequence-length MAX_SEQUENCE_LENGTH]
                                [--interaction-partner-seed 
INTERACTION_PARTNER_SEED]
                                [--interaction-partner-exclude 
INTERACTION_PARTNER_EXCLUDE]
                                [--min-residues MIN_RESIDUES]
                                [--max-residues MAX_RESIDUES] [--limit LIMIT]
                                session_dir

Positional Arguments:
  session_dir           Session directory to store results

Options:
  -h, --help            show this help message and exit
  --taxon-id TAXON_ID   NCBI Taxon ID
  --reviewed, --no-reviewed
                        Reviewed=swissprot, no-reviewed=trembl. Default is
                        uniprot=swissprot+trembl.
  --subcellular-location-uniprot SUBCELLULAR_LOCATION_UNIPROT
                        Subcellular location (UniProt)
  --subcellular-location-go SUBCELLULAR_LOCATION_GO
                        Subcellular location (GO term, e.g. GO:0005737). Can
                        be specified multiple times.
  --molecular-function-go MOLECULAR_FUNCTION_GO
                        Molecular function (GO term, e.g. GO:0003677). Can be
                        specified multiple times.
  --min-sequence-length MIN_SEQUENCE_LENGTH
                        Minimum length of the canonical sequence.
  --max-sequence-length MAX_SEQUENCE_LENGTH
                        Maximum length of the canonical sequence.
  --interaction-partner-seed INTERACTION_PARTNER_SEED
                        UniProt ID to use as interaction partner seed. The
                        search will be expanded to include structures
                        identifiers of the found interaction partners. Can be
                        specified multiple times.
  --interaction-partner-exclude INTERACTION_PARTNER_EXCLUDE
                        UniProt ID to exclude as found interaction partners.
                        Can be specified multiple times.
  --min-residues MIN_RESIDUES
                        Minimum number of residues required in the chain
                        mapped to the UniProt accession.
  --max-residues MAX_RESIDUES
                        Maximum number of residues allowed in chain mapped to
                        the UniProt accession.
  --limit LIMIT         Limit number of results


retrieve

protein-detective retrieve --help

Usage: protein-detective retrieve [-h] [--what {alphafold,pdbe}]
                                  [--what-af-formats 
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msaUrl,paeDoc,pdb,pl
ddtDocUrl,summary}]
                                  session_dir

Positional Arguments:
  session_dir           Session directory to store results

Options:
  -h, --help            show this help message and exit
  --what {alphafold,pdbe}
                        What to retrieve. Can be specified multiple times.
                        Default is pdbe and alphafold.
  --what-af-formats 
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msaUrl,paeDoc,pdb,pl
ddtDocUrl,summary}
                        AlphaFold formats to retrieve. Can be specified
                        multiple times. Default is 'cif'.


filter

protein-detective filter --help

Usage: protein-detective filter [-h]
                                [--confidence-threshold CONFIDENCE_THRESHOLD]
                                [--min-residues MIN_RESIDUES]
                                [--max-residues MAX_RESIDUES]
                                [--abs-min-helix-residues 
ABS_MIN_HELIX_RESIDUES]
                                [--abs-max-helix-residues 
ABS_MAX_HELIX_RESIDUES]
                                [--abs-min-sheet-residues 
ABS_MIN_SHEET_RESIDUES]
                                [--abs-max-sheet-residues 
ABS_MAX_SHEET_RESIDUES]
                                [--ratio-min-helix-residues 
RATIO_MIN_HELIX_RESIDUES]
                                [--ratio-max-helix-residues 
RATIO_MAX_HELIX_RESIDUES]
                                [--ratio-min-sheet-residues 
RATIO_MIN_SHEET_RESIDUES]
                                [--ratio-max-sheet-residues 
RATIO_MAX_SHEET_RESIDUES]
                                [--scheduler-address SCHEDULER_ADDRESS]
                                session_dir

Filter structures based on

- For PDBe structures the chain of Uniprot protein is written as chain A.
- For AlphaFold structures filter by confidence (pLDDT) threshold
- Number of residues in chain A
  For AlphaFold structures writes new files with low confidence residues (below 
threshold) removed
- Number of residues in secondary structure (helices and sheets)
- For determining the fraction or number of Secondary Structure elements see the
following notebook: 
https://www.bonvinlab.org/protein-detective/SSE_elements.html

Positional Arguments:
  session_dir           Session directory to store results

Options:
  -h, --help            show this help message and exit
  --confidence-threshold CONFIDENCE_THRESHOLD
                        pLDDT confidence threshold (0-100) for AlphaFold
                        structures. Default is 70.0.
  --min-residues MIN_RESIDUES
                        Minimum number of residues in chain A
  --max-residues MAX_RESIDUES
                        Maximum number of residues in chain A
  --abs-min-helix-residues ABS_MIN_HELIX_RESIDUES
                        Minimum number residues in helices
  --abs-max-helix-residues ABS_MAX_HELIX_RESIDUES
                        Maximum number residues in helices
  --abs-min-sheet-residues ABS_MIN_SHEET_RESIDUES
                        Minimum number residues in sheets
  --abs-max-sheet-residues ABS_MAX_SHEET_RESIDUES
                        Maximum number residues in sheets
  --ratio-min-helix-residues RATIO_MIN_HELIX_RESIDUES
                        Minimum number residues in helices (fraction of total)
  --ratio-max-helix-residues RATIO_MAX_HELIX_RESIDUES
                        Maximum number residues in helices (fraction of total)
  --ratio-min-sheet-residues RATIO_MIN_SHEET_RESIDUES
                        Minimum number residues in sheets (fraction of total)
  --ratio-max-sheet-residues RATIO_MAX_SHEET_RESIDUES
                        Maximum number residues in sheets (fraction of total)
  --scheduler-address SCHEDULER_ADDRESS
                        Address of the Dask scheduler to connect to. If not
                        provided, will create a local cluster.


powerfit

protein-detective powerfit --help

Usage: protein-detective powerfit [-h]
                                  {commands,run,report,fit-models,list-runs,list
-lcc} ...

Positional Arguments:
  {commands,run,report,fit-models,list-runs,list-lcc}
    commands            Generate PowerFit commands for PDB files in the
                        session directory
    run                 Run PowerFit on PDB files in the session directory
    report              Generate a report of the best PowerFit solutions.
    fit-models          Fit models based on PowerFit solutions
    list-runs           List all PowerFit runs in the session directory
    list-lcc            List Local Cross Validation (lcc.mrc) files for
                        PowerFit runs

Options:
  -h, --help            show this help message and exit


powerfit commands

protein-detective powerfit commands --help

Usage: protein-detective powerfit commands [-h] [-a <float>] [-nl] [-ncw]
                                           [-nr] [-rr <float>] [-nt]
                                           [-tc <float>] [-p <int>] [-g [GPU]]
                                           [--output OUTPUT]
                                           target resolution session_dir

Positional Arguments:
  target                Target density map to fit the model in. Data should
                        either be in CCP4 or MRC format
  resolution            Resolution of map in angstrom
  session_dir           Session directory for input and output

Options:
  -h, --help            show this help message and exit
  -a, --angle <float>   Rotational sampling density in degree. Increasing this
                        number by a factor of 2 results in approximately 8
                        times more rotations sampled.
  -nl, --no-laplace     Do not use the Laplace pre-filter density data.
  -ncw, --no-core-weighted
                        Do not use core-weighted local cross-correlation
                        score.
  -nr, --no-resampling  Do not resample the density map.
  -rr, --resampling-rate <float>
                        Resampling rate compared to Nyquist.
  -nt, --no-trimming    Do not trim the density map.
  -tc, --trimming-cutoff <float>
                        Intensity cutoff to which the map will be trimmed.
                        Default is 10 percent of maximum intensity.
  -p, --nproc <int>     Number of processors used during search. The number
                        will be capped at the total number of available
                        processors on your machine.
  -g, --gpu [GPU]       Off-load the intensive calculations to the GPU.
                        Optionally specify number of workers per GPU (default:
                        1).
  --output OUTPUT       Output file for powerfit commands. If set to '-'
                        (default) will print to stdout.


powerfit run

protein-detective powerfit run --help

Usage: protein-detective powerfit run [-h] [-a <float>] [-nl] [-ncw] [-nr]
                                      [-rr <float>] [-nt] [-tc <float>]
                                      [-p <int>] [-g [GPU]]
                                      [--scheduler-address SCHEDULER_ADDRESS]
                                      target resolution session_dir

Run PowerFit on PDB files in the session directory and store results.

Positional Arguments:
  target                Target density map to fit the model in. Data should
                        either be in CCP4 or MRC format
  resolution            Resolution of map in angstrom
  session_dir           Session directory containing PDB files

Options:
  -h, --help            show this help message and exit
  -a, --angle <float>   Rotational sampling density in degree. Increasing this
                        number by a factor of 2 results in approximately 8
                        times more rotations sampled.
  -nl, --no-laplace     Do not use the Laplace pre-filter density data.
  -ncw, --no-core-weighted
                        Do not use core-weighted local cross-correlation
                        score.
  -nr, --no-resampling  Do not resample the density map.
  -rr, --resampling-rate <float>
                        Resampling rate compared to Nyquist.
  -nt, --no-trimming    Do not trim the density map.
  -tc, --trimming-cutoff <float>
                        Intensity cutoff to which the map will be trimmed.
                        Default is 10 percent of maximum intensity.
  -p, --nproc <int>     Number of processors used during search. The number
                        will be capped at the total number of available
                        processors on your machine.
  -g, --gpu [GPU]       Off-load the intensive calculations to the GPU.
                        Optionally specify number of workers per GPU (default:
                        1).
  --scheduler-address SCHEDULER_ADDRESS
                        Address of the Dask scheduler to connect to. If not
                        provided, will create a local cluster.


powerfit report

protein-detective powerfit report --help

Usage: protein-detective powerfit report [-h]
                                         [--powerfit_run_id POWERFIT_RUN_ID]
                                         [--top TOP] [--output OUTPUT]
                                         session_dir

Positional Arguments:
  session_dir           Session directory containing PowerFit results

Options:
  -h, --help            show this help message and exit
  --powerfit_run_id POWERFIT_RUN_ID
                        ID of the PowerFit run to report on
  --top TOP             Number of top solutions to report
  --output OUTPUT       Output file for solutions table. If set to '-'
                        (default) will print to stdout.


powerfit fit-models

protein-detective powerfit fit-models --help

Usage: protein-detective powerfit fit-models [-h]
                                             [--powerfit_run_id POWERFIT_RUN_ID]
                                             [--top TOP] [--output OUTPUT]
                                             session_dir

Positional Arguments:
  session_dir           Session directory containing PowerFit results

Options:
  -h, --help            show this help message and exit
  --powerfit_run_id POWERFIT_RUN_ID
                        ID of the PowerFit run to report on. If not provided,
                        will use the all runs.
  --top TOP             Number of top solutions to fit models for
  --output OUTPUT       Output file for fitted model table. If set to '-'
                        (default) will print to stdout.


powerfit list-runs

protein-detective powerfit list-runs --help

Usage: protein-detective powerfit list-runs [-h] session_dir

Positional Arguments:
  session_dir  Session directory containing PowerFit results

Options:
  -h, --help   show this help message and exit


powerfit list-lcc

protein-detective powerfit list-lcc --help

Usage: protein-detective powerfit list-lcc [-h] session_dir

Positional Arguments:
  session_dir  Session directory containing PowerFit results

Options:
  -h, --help   show this help message and exit