CLI Reference
Documentation for the protein-detective script.
protein-detective --help
Usage: protein-detective [-h]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--version]
{search,retrieve,filter,powerfit} ...
Protein Detective CLI
Positional Arguments:
{search,retrieve,filter,powerfit}
search Search UniProt for structures
retrieve Retrieve structures
filter Filter structures
powerfit PowerFit related commands
Options:
-h, --help show this help message and exit
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
--version show program's version number and exit
search
protein-detective search --help
Usage: protein-detective search [-h] [--taxon-id TAXON_ID]
[--reviewed | --no-reviewed]
[--subcellular-location-uniprot
SUBCELLULAR_LOCATION_UNIPROT]
[--subcellular-location-go
SUBCELLULAR_LOCATION_GO]
[--molecular-function-go MOLECULAR_FUNCTION_GO]
[--min-sequence-length MIN_SEQUENCE_LENGTH]
[--max-sequence-length MAX_SEQUENCE_LENGTH]
[--interaction-partner-seed
INTERACTION_PARTNER_SEED]
[--interaction-partner-exclude
INTERACTION_PARTNER_EXCLUDE]
[--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES] [--limit LIMIT]
session_dir
Positional Arguments:
session_dir Session directory to store results
Options:
-h, --help show this help message and exit
--taxon-id TAXON_ID NCBI Taxon ID
--reviewed, --no-reviewed
Reviewed=swissprot, no-reviewed=trembl. Default is
uniprot=swissprot+trembl.
--subcellular-location-uniprot SUBCELLULAR_LOCATION_UNIPROT
Subcellular location (UniProt)
--subcellular-location-go SUBCELLULAR_LOCATION_GO
Subcellular location (GO term, e.g. GO:0005737). Can
be specified multiple times.
--molecular-function-go MOLECULAR_FUNCTION_GO
Molecular function (GO term, e.g. GO:0003677). Can be
specified multiple times.
--min-sequence-length MIN_SEQUENCE_LENGTH
Minimum length of the canonical sequence.
--max-sequence-length MAX_SEQUENCE_LENGTH
Maximum length of the canonical sequence.
--interaction-partner-seed INTERACTION_PARTNER_SEED
UniProt ID to use as interaction partner seed. The
search will be expanded to include structures
identifiers of the found interaction partners. Can be
specified multiple times.
--interaction-partner-exclude INTERACTION_PARTNER_EXCLUDE
UniProt ID to exclude as found interaction partners.
Can be specified multiple times.
--min-residues MIN_RESIDUES
Minimum number of residues required in the chain
mapped to the UniProt accession.
--max-residues MAX_RESIDUES
Maximum number of residues allowed in chain mapped to
the UniProt accession.
--limit LIMIT Limit number of results
retrieve
protein-detective retrieve --help
Usage: protein-detective retrieve [-h] [--what {alphafold,pdbe}]
[--what-af-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msaUrl,paeDoc,pdb,pl
ddtDocUrl,summary}]
session_dir
Positional Arguments:
session_dir Session directory to store results
Options:
-h, --help show this help message and exit
--what {alphafold,pdbe}
What to retrieve. Can be specified multiple times.
Default is pdbe and alphafold.
--what-af-formats
{amAnnotations,amAnnotationsHg19,amAnnotationsHg38,bcif,cif,msaUrl,paeDoc,pdb,pl
ddtDocUrl,summary}
AlphaFold formats to retrieve. Can be specified
multiple times. Default is 'cif'.
filter
protein-detective filter --help
Usage: protein-detective filter [-h]
[--confidence-threshold CONFIDENCE_THRESHOLD]
[--min-residues MIN_RESIDUES]
[--max-residues MAX_RESIDUES]
[--abs-min-helix-residues
ABS_MIN_HELIX_RESIDUES]
[--abs-max-helix-residues
ABS_MAX_HELIX_RESIDUES]
[--abs-min-sheet-residues
ABS_MIN_SHEET_RESIDUES]
[--abs-max-sheet-residues
ABS_MAX_SHEET_RESIDUES]
[--ratio-min-helix-residues
RATIO_MIN_HELIX_RESIDUES]
[--ratio-max-helix-residues
RATIO_MAX_HELIX_RESIDUES]
[--ratio-min-sheet-residues
RATIO_MIN_SHEET_RESIDUES]
[--ratio-max-sheet-residues
RATIO_MAX_SHEET_RESIDUES]
[--scheduler-address SCHEDULER_ADDRESS]
session_dir
Filter structures based on
- For PDBe structures the chain of Uniprot protein is written as chain A.
- For AlphaFold structures filter by confidence (pLDDT) threshold
- Number of residues in chain A
For AlphaFold structures writes new files with low confidence residues (below
threshold) removed
- Number of residues in secondary structure (helices and sheets)
- For determining the fraction or number of Secondary Structure elements see the
following notebook:
https://www.bonvinlab.org/protein-detective/SSE_elements.html
Positional Arguments:
session_dir Session directory to store results
Options:
-h, --help show this help message and exit
--confidence-threshold CONFIDENCE_THRESHOLD
pLDDT confidence threshold (0-100) for AlphaFold
structures. Default is 70.0.
--min-residues MIN_RESIDUES
Minimum number of residues in chain A
--max-residues MAX_RESIDUES
Maximum number of residues in chain A
--abs-min-helix-residues ABS_MIN_HELIX_RESIDUES
Minimum number residues in helices
--abs-max-helix-residues ABS_MAX_HELIX_RESIDUES
Maximum number residues in helices
--abs-min-sheet-residues ABS_MIN_SHEET_RESIDUES
Minimum number residues in sheets
--abs-max-sheet-residues ABS_MAX_SHEET_RESIDUES
Maximum number residues in sheets
--ratio-min-helix-residues RATIO_MIN_HELIX_RESIDUES
Minimum number residues in helices (fraction of total)
--ratio-max-helix-residues RATIO_MAX_HELIX_RESIDUES
Maximum number residues in helices (fraction of total)
--ratio-min-sheet-residues RATIO_MIN_SHEET_RESIDUES
Minimum number residues in sheets (fraction of total)
--ratio-max-sheet-residues RATIO_MAX_SHEET_RESIDUES
Maximum number residues in sheets (fraction of total)
--scheduler-address SCHEDULER_ADDRESS
Address of the Dask scheduler to connect to. If not
provided, will create a local cluster.
powerfit
protein-detective powerfit --help
Usage: protein-detective powerfit [-h]
{commands,run,report,fit-models,list-runs,list
-lcc} ...
Positional Arguments:
{commands,run,report,fit-models,list-runs,list-lcc}
commands Generate PowerFit commands for PDB files in the
session directory
run Run PowerFit on PDB files in the session directory
report Generate a report of the best PowerFit solutions.
fit-models Fit models based on PowerFit solutions
list-runs List all PowerFit runs in the session directory
list-lcc List Local Cross Validation (lcc.mrc) files for
PowerFit runs
Options:
-h, --help show this help message and exit
powerfit commands
protein-detective powerfit commands --help
Usage: protein-detective powerfit commands [-h] [-a <float>] [-nl] [-ncw]
[-nr] [-rr <float>] [-nt]
[-tc <float>] [-p <int>] [-g [GPU]]
[--output OUTPUT]
target resolution session_dir
Positional Arguments:
target Target density map to fit the model in. Data should
either be in CCP4 or MRC format
resolution Resolution of map in angstrom
session_dir Session directory for input and output
Options:
-h, --help show this help message and exit
-a, --angle <float> Rotational sampling density in degree. Increasing this
number by a factor of 2 results in approximately 8
times more rotations sampled.
-nl, --no-laplace Do not use the Laplace pre-filter density data.
-ncw, --no-core-weighted
Do not use core-weighted local cross-correlation
score.
-nr, --no-resampling Do not resample the density map.
-rr, --resampling-rate <float>
Resampling rate compared to Nyquist.
-nt, --no-trimming Do not trim the density map.
-tc, --trimming-cutoff <float>
Intensity cutoff to which the map will be trimmed.
Default is 10 percent of maximum intensity.
-p, --nproc <int> Number of processors used during search. The number
will be capped at the total number of available
processors on your machine.
-g, --gpu [GPU] Off-load the intensive calculations to the GPU.
Optionally specify number of workers per GPU (default:
1).
--output OUTPUT Output file for powerfit commands. If set to '-'
(default) will print to stdout.
powerfit run
protein-detective powerfit run --help
Usage: protein-detective powerfit run [-h] [-a <float>] [-nl] [-ncw] [-nr]
[-rr <float>] [-nt] [-tc <float>]
[-p <int>] [-g [GPU]]
[--scheduler-address SCHEDULER_ADDRESS]
target resolution session_dir
Run PowerFit on PDB files in the session directory and store results.
Positional Arguments:
target Target density map to fit the model in. Data should
either be in CCP4 or MRC format
resolution Resolution of map in angstrom
session_dir Session directory containing PDB files
Options:
-h, --help show this help message and exit
-a, --angle <float> Rotational sampling density in degree. Increasing this
number by a factor of 2 results in approximately 8
times more rotations sampled.
-nl, --no-laplace Do not use the Laplace pre-filter density data.
-ncw, --no-core-weighted
Do not use core-weighted local cross-correlation
score.
-nr, --no-resampling Do not resample the density map.
-rr, --resampling-rate <float>
Resampling rate compared to Nyquist.
-nt, --no-trimming Do not trim the density map.
-tc, --trimming-cutoff <float>
Intensity cutoff to which the map will be trimmed.
Default is 10 percent of maximum intensity.
-p, --nproc <int> Number of processors used during search. The number
will be capped at the total number of available
processors on your machine.
-g, --gpu [GPU] Off-load the intensive calculations to the GPU.
Optionally specify number of workers per GPU (default:
1).
--scheduler-address SCHEDULER_ADDRESS
Address of the Dask scheduler to connect to. If not
provided, will create a local cluster.
powerfit report
protein-detective powerfit report --help
Usage: protein-detective powerfit report [-h]
[--powerfit_run_id POWERFIT_RUN_ID]
[--top TOP] [--output OUTPUT]
session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit
--powerfit_run_id POWERFIT_RUN_ID
ID of the PowerFit run to report on
--top TOP Number of top solutions to report
--output OUTPUT Output file for solutions table. If set to '-'
(default) will print to stdout.
powerfit fit-models
protein-detective powerfit fit-models --help
Usage: protein-detective powerfit fit-models [-h]
[--powerfit_run_id POWERFIT_RUN_ID]
[--top TOP] [--output OUTPUT]
session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit
--powerfit_run_id POWERFIT_RUN_ID
ID of the PowerFit run to report on. If not provided,
will use the all runs.
--top TOP Number of top solutions to fit models for
--output OUTPUT Output file for fitted model table. If set to '-'
(default) will print to stdout.
powerfit list-runs
protein-detective powerfit list-runs --help
Usage: protein-detective powerfit list-runs [-h] session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit
powerfit list-lcc
protein-detective powerfit list-lcc --help
Usage: protein-detective powerfit list-lcc [-h] session_dir
Positional Arguments:
session_dir Session directory containing PowerFit results
Options:
-h, --help show this help message and exit