CLI Reference
Documentation for the protein-detective script.
protein-detective --help
Usage: protein-detective [-h]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--version]
{search,retrieve,filter,import-structures,powerfit} ...
Protein Detective CLI
Positional Arguments:
{search,retrieve,filter,import-structures,powerfit}
search Search UniProt for structures
retrieve Retrieve structures
filter Filter structures
import-structures Import structures from a directory into the session
powerfit PowerFit related commands
Options:
-h, --help show this help message and exit
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--version show program's version number and exit
search
protein-detective search --help
Usage: protein-detective search [-h] [--taxon-id TAXON_ID]
[--reviewed | --no-reviewed]
[--subcellular-location-uniprot
SUBCELLULAR_LOCATION_UNIPROT]
[--subcellular-location-go
SUBCELLULAR_LOCATION_GO]
[--molecular-function-go MOLECULAR_FUNCTION_GO]
[--min-sequence-length MIN_SEQUENCE_LENGTH]
[--max-sequence-length MAX_SEQUENCE_LENGTH]
[--interaction-partner-seed
INTERACTION_PARTNER_SEED]
[--interaction-partner-exclude
INTERACTION_PARTNER_EXCLUDE]
[--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES] [--limit LIMIT]
session_dir
Positional Arguments:
session_dir Session directory to store results
Options:
-h, --help show this help message and exit
--taxon-id TAXON_ID NCBI Taxon ID
--reviewed, --no-reviewed
Reviewed=swissprot, no-reviewed=trembl. Default is
uniprot=swissprot+trembl.
--subcellular-location-uniprot SUBCELLULAR_LOCATION_UNIPROT
Subcellular location (UniProt)
--subcellular-location-go SUBCELLULAR_LOCATION_GO
Subcellular location (GO term, e.g. GO:0005737). Can
be specified multiple times.
--molecular-function-go MOLECULAR_FUNCTION_GO
Molecular function (GO term, e.g. GO:0003677). Can be
specified multiple times.
--min-sequence-length MIN_SEQUENCE_LENGTH
Minimum length of the canonical sequence.
--max-sequence-length MAX_SEQUENCE_LENGTH
Maximum length of the canonical sequence.
--interaction-partner-seed INTERACTION_PARTNER_SEED
UniProt ID to use as interaction partner seed. The
search will be expanded to include structures
identifiers of the found interaction partners. Can be
specified multiple times.
--interaction-partner-exclude INTERACTION_PARTNER_EXCLUDE
UniProt ID to exclude as found interaction partners.
Can be specified multiple times.
--min-residues MIN_RESIDUES
Minimum number of residues required in the chain
mapped to the UniProt accession.
--max-residues MAX_RESIDUES
Maximum number of residues allowed in chain mapped to
the UniProt accession.
--limit LIMIT Limit number of results
retrieve
protein-detective retrieve --help
Usage: protein-detective retrieve [-h] [--what {alphafold,pdbe}]
[--what-af-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msa,paeDoc,pdb,plddt
Doc,summary}]
session_dir
Positional Arguments:
session_dir Session directory to store results
Options:
-h, --help show this help message and exit
--what {alphafold,pdbe}
What to retrieve. Can be specified multiple times.
Default is pdbe and alphafold.
--what-af-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msa,paeDoc,pdb,plddt
Doc,summary}
AlphaFold formats to retrieve. Can be specified
multiple times. Default is 'cif'.
filter
protein-detective filter --help
Usage: protein-detective filter [-h]
[--confidence-threshold CONFIDENCE_THRESHOLD]
[--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES]
[--abs-min-helix-residues
ABS_MIN_HELIX_RESIDUES]
[--abs-max-helix-residues
ABS_MAX_HELIX_RESIDUES]
[--abs-min-sheet-residues
ABS_MIN_SHEET_RESIDUES]
[--abs-max-sheet-residues
ABS_MAX_SHEET_RESIDUES]
[--ratio-min-helix-residues
RATIO_MIN_HELIX_RESIDUES]
[--ratio-max-helix-residues
RATIO_MAX_HELIX_RESIDUES]
[--ratio-min-sheet-residues
RATIO_MIN_SHEET_RESIDUES]
[--ratio-max-sheet-residues
RATIO_MAX_SHEET_RESIDUES]
[--scheduler-address SCHEDULER_ADDRESS]
session_dir
Filter structures based on
- For PDBe structures the chain of Uniprot protein is written as chain A.
- For AlphaFold structures filter by confidence (pLDDT) threshold
- Number of residues in chain A
For AlphaFold structures writes new files with low confidence residues (below
threshold) removed
- Number of residues in secondary structure (helices and sheets)
- For determining the fraction or number of Secondary Structure elements see the
following notebook:
https://www.bonvinlab.org/protein-detective/SSE_elements.html
Positional Arguments:
session_dir Session directory to store results
Options:
-h, --help show this help message and exit
--confidence-threshold CONFIDENCE_THRESHOLD
pLDDT confidence threshold (0-100) for AlphaFold
structures. Default is 70.0.
--min-residues MIN_RESIDUES
Minimum number of residues in chain A
--max-residues MAX_RESIDUES
Maximum number of residues in chain A
--abs-min-helix-residues ABS_MIN_HELIX_RESIDUES
Minimum number residues in helices
--abs-max-helix-residues ABS_MAX_HELIX_RESIDUES
Maximum number residues in helices
--abs-min-sheet-residues ABS_MIN_SHEET_RESIDUES
Minimum number residues in sheets
--abs-max-sheet-residues ABS_MAX_SHEET_RESIDUES
Maximum number residues in sheets
--ratio-min-helix-residues RATIO_MIN_HELIX_RESIDUES
Minimum number residues in helices (fraction of total)
--ratio-max-helix-residues RATIO_MAX_HELIX_RESIDUES
Maximum number residues in helices (fraction of total)
--ratio-min-sheet-residues RATIO_MIN_SHEET_RESIDUES
Minimum number residues in sheets (fraction of total)
--ratio-max-sheet-residues RATIO_MAX_SHEET_RESIDUES
Maximum number residues in sheets (fraction of total)
--scheduler-address SCHEDULER_ADDRESS
Address of the Dask scheduler to connect to. If not
provided, will create a local cluster.
import-structures
protein-detective import-structures --help
Usage: protein-detective import-structures [-h]
[--copy-method
{symlink,copy,hardlink}]
[--strict]
structures_dir session_dir
Import structures from a directory into the session.
The directory should contain structure files in PDB or mmCIF format.
This can be used to import structures obtained from other sources,
or to re-import structures after filtering with external tools.
Positional Arguments:
structures_dir Directory containing structure files to import
session_dir Session directory to store results
Options:
-h, --help show this help message and exit
--copy-method {symlink,copy,hardlink}
Method to use for importing files. Default is
'hardlink'. If 'copy', files will be copied. If
'symlink', symbolic links will be created. If
'hardlink', hard links will be created (unavailable on
Windows).
--strict Raise an error if structure files do not meet expected
criteria (single chain A, single UniProt accession).
Without this flag, files that do not meet these
criteria are skipped with a warning.
powerfit
protein-detective powerfit --help
Usage: protein-detective powerfit [-h]
{commands,run,report,fit-models,list-runs,list
-lcc} ...
Positional Arguments:
{commands,run,report,fit-models,list-runs,list-lcc}
commands Generate PowerFit commands for PDB files in the
session directory
run Run PowerFit on PDB files in the session directory
report Generate a report of the best PowerFit solutions.
fit-models Fit models based on PowerFit solutions
list-runs List all PowerFit runs in the session directory
list-lcc List Local Cross Validation (lcc.mrc) files for
PowerFit runs
Options:
-h, --help show this help message and exit
powerfit commands
protein-detective powerfit commands --help
Usage: protein-detective powerfit commands [-h] [-a <float>] [-nl] [-ncw]
[-nr] [-rr <float>] [-nt]
[-tc <float>] [-p <int>]
[--batch-size <int>] [-g [GPU]]
[--gpu-backend {opencl,cuda}]
[--output OUTPUT]
target resolution session_dir
Positional Arguments:
target Target density map to fit the model in. Data should
either be in CCP4 or MRC format
resolution Resolution of map in angstrom
session_dir Session directory for input and output
Options:
-h, --help show this help message and exit
-a, --angle <float> Rotational sampling density in degree. Increasing this
number by a factor of 2 results in approximately 8
times more rotations sampled.
-nl, --no-laplace Do not use the Laplace pre-filter density data.
-ncw, --no-core-weighted
Do not use core-weighted local cross-correlation
score.
-nr, --no-resampling Do not resample the density map.
-rr, --resampling-rate <float>
Resampling rate compared to Nyquist.
-nt, --no-trimming Do not trim the density map.
-tc, --trimming-cutoff <float>
Intensity cutoff to which the map will be trimmed.
Default is 10 percent of maximum intensity.
-p, --nproc <int> Number of processors used during search. The number
will be capped at the total number of available
processors on your machine.
--batch-size <int> GPU batch size to use. Use 0 to disable batching
entirely, or a positive integer to force a specific
batch size. Applies to GPU backends (CUDA/OpenCL). If
set too high will cause out-of-memory errors.
-g, --gpu [GPU] Off-load the intensive calculations to the GPU.
Optionally specify number of workers per GPU (default:
1).
--gpu-backend {opencl,cuda}
GPU backend to target when generating PowerFit
commands.
--output OUTPUT Output file for powerfit commands. If set to '-'
(default) will print to stdout.
powerfit run
protein-detective powerfit run --help
Usage: protein-detective powerfit run [-h] [-a <float>] [-nl] [-ncw] [-nr]
[-rr <float>] [-nt] [-tc <float>]
[-p <int>] [--batch-size <int>]
[-g [GPU]] [--gpu-backend {opencl,cuda}]
[--scheduler-address SCHEDULER_ADDRESS]
target resolution session_dir
Run PowerFit on PDB files in the session directory and store results.
Positional Arguments:
target Target density map to fit the model in. Data should
either be in CCP4 or MRC format
resolution Resolution of map in angstrom
session_dir Session directory containing PDB files
Options:
-h, --help show this help message and exit
-a, --angle <float> Rotational sampling density in degree. Increasing this
number by a factor of 2 results in approximately 8
times more rotations sampled.
-nl, --no-laplace Do not use the Laplace pre-filter density data.
-ncw, --no-core-weighted
Do not use core-weighted local cross-correlation
score.
-nr, --no-resampling Do not resample the density map.
-rr, --resampling-rate <float>
Resampling rate compared to Nyquist.
-nt, --no-trimming Do not trim the density map.
-tc, --trimming-cutoff <float>
Intensity cutoff to which the map will be trimmed.
Default is 10 percent of maximum intensity.
-p, --nproc <int> Number of processors used during search. The number
will be capped at the total number of available
processors on your machine.
--batch-size <int> GPU batch size to use. Use 0 to disable batching
entirely, or a positive integer to force a specific
batch size. Applies to GPU backends (CUDA/OpenCL). If
set too high will cause out-of-memory errors.
-g, --gpu [GPU] Off-load the intensive calculations to the GPU.
Optionally specify number of workers per GPU (default:
1).
--gpu-backend {opencl,cuda}
GPU backend to target when running PowerFit.
--scheduler-address SCHEDULER_ADDRESS
Address of the Dask scheduler to connect to. If not
provided, will create a local cluster.
powerfit report
protein-detective powerfit report --help
Usage: protein-detective powerfit report [-h]
[--powerfit_run_id POWERFIT_RUN_ID]
[--top TOP] [--output OUTPUT]
session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit
--powerfit_run_id POWERFIT_RUN_ID
ID of the PowerFit run to report on
--top TOP Number of top solutions to report
--output OUTPUT Output file for solutions table. If set to '-'
(default) will print to stdout.
powerfit fit-models
protein-detective powerfit fit-models --help
Usage: protein-detective powerfit fit-models [-h]
[--powerfit_run_id POWERFIT_RUN_ID]
[--top TOP] [--output OUTPUT]
session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit
--powerfit_run_id POWERFIT_RUN_ID
ID of the PowerFit run to report on. If not provided,
will use the all runs.
--top TOP Number of top solutions to fit models for
--output OUTPUT Output file for fitted model table. If set to '-'
(default) will print to stdout.
powerfit list-runs
protein-detective powerfit list-runs --help
Usage: protein-detective powerfit list-runs [-h] session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit
powerfit list-lcc
protein-detective powerfit list-lcc --help
Usage: protein-detective powerfit list-lcc [-h] session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit