Skip to content

filter

Filter subcommands for protein-quest.

chain(chains, input_dir, output_dir, /, *, scheduler_address=None, cache=None, _=None)

Filter on chain.

For each input PDB/mmCIF and chain combination write a PDB/mmCIF file with just the given chain and rename it to chain A. Filtering is done in parallel using a Dask cluster.

Parameters:

Name Type Description Default
chains InputFile

CSV file with pdb_id and chain columns. Other columns are ignored.

required
input_dir InputDir

Directory with PDB/mmCIF files. Expected filenames are {pdb_id}.cif.gz, {pdb_id}.cif, {pdb_id}.pdb.gz or {pdb_id}.pdb.

required
output_dir OutputDir

Directory to write the single-chain PDB/mmCIF files. Output files are in same format as input files.

required
scheduler_address str | None

Address of the Dask scheduler to connect to. If not provided, will create a local cluster. If set to sequential will run tasks sequentially.

None
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None

confidence(input_dir, output_dir, /, *, filters=None, write_stats=None, scheduler_address=None, cache=None, _=None)

Filter AlphaFold mmcif/PDB files by confidence (plDDT).

Filter AlphaFold mmcif/PDB files by confidence (plDDT). Passed files are written with residues below threshold removed.

Parameters:

Name Type Description Default
input_dir InputDir

Directory with AlphaFold mmcif/PDB files.

required
output_dir OutputDir

Directory to write filtered mmcif/PDB files.

required
filters ConfidenceFilterQuery | None

Confidence filtering criteria.

None
write_stats OutputFile | None

Write filter statistics to file. In CSV format with <input_file>,<residue_count>,<passed>,<output_file> columns. Use - for stdout.

None
scheduler_address str | None

Address of the Dask scheduler to connect to. If not provided, will create a local cluster. If set to sequential will run tasks sequentially.

None
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None

residue(input_dir, output_dir, /, *, min_residues=0, max_residues=10000000, write_stats=None, cache=None, _=None)

Filter PDB/mmCIF files by number of residues in chain A.

Filter PDB/mmCIF files by number of residues in chain A.

Parameters:

Name Type Description Default
input_dir InputDir

Directory with PDB/mmCIF files (for example from 'filter chain').

required
output_dir OutputDir

Directory to write filtered PDB/mmCIF files. Files are copied without modification.

required
min_residues MinResidues

Min residues in chain A.

0
max_residues MaxResidues

Max residues in chain A.

10000000
write_stats OutputFile | None

Write filter statistics to file. In CSV format with <input_file>,<residue_count>,<passed>,<output_file> columns. Use - for stdout.

None
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None

resolution(input_dir, output_dir, /, *, group_by='uniprot_accession', no_group_by=False, top=1000, write_stats=None, cache=None, _=None)

Filter structure files by best resolution.

AlphaFold structures are preferred over non-AlphaFold. Structures with lower resolution are preferred. If resolution is the same, structures with more residues are preferred. If resolution is missing, those structures are undesirable.

Parameters:

Name Type Description Default
input_dir InputDir

Directory structure files.

required
output_dir OutputDir

Directory to write the selected structure files.

required
group_by Annotated[GroupBy, Parameter(group=_GROUP_BY)]

Pass top-N structures with best resolution per uniprot accession. Structures without uniprot accession are never passed. Mutually exclusive with no_group_by.

'uniprot_accession'
no_group_by Annotated[bool, Parameter(name=--no - group - by, negative='', group=_GROUP_BY)]

Disable grouping and use global top-N ranking across all files. Mutually exclusive with group_by.

False
top PositiveInt

Maximum number of files to keep.

1000
write_stats OutputFile | None

Write filter statistics to file. In CSV format. For --group-by=uniprot_accession columns are: <input_file>,<uniprot_accession>,<resolution>,<total_residue_count>,<is_alphafold>,<passed>,<output_file>. For --no-group-by columns are: <input_file>,<resolution>,<total_residue_count>,<is_alphafold>,<passed>,<output_file>. Use - for stdout.

None
cache CacheParameter | None

Cache options

None
_ Common | None

Common CLI options.

None

secondary_structure(input_dir, output_dir, /, *, filters=None, write_stats=None, cache=None, _=None)

Filter PDB/mmCIF files by secondary structure.

Filter PDB/mmCIF files by secondary structure.

Parameters:

Name Type Description Default
input_dir InputDir

Directory with PDB/mmCIF files.

required
output_dir OutputDir

Directory to write filtered PDB/mmCIF files. Files are copied without modification.

required
filters SecondaryStructureFilterQuery | None

Secondary structure filtering criteria.

None
write_stats OutputFile | None

Write filter statistics to file. In CSV format with columns: <input_file>,<nr_residues>,<nr_helix_residues>,<nr_sheet_residues>, <helix_ratio>,<sheet_ratio>,<passed>,<output_file>. Use - for stdout.

None
cache CacheParameter | None

Cache options including no_cache, cache_dir, and copy_method.

None
_ Common | None

Common CLI options.

None