Skip to content

filters

Module for filtering structure files and their contents.

ResidueFilterStatistics dataclass

Statistics for filtering files based on residue count in a specific chain.

Parameters:

Name Type Description Default
input_file Path

The path to the input file.

required
residue_count int

The number of residues.

required
passed bool

Whether the file passed the filtering criteria.

required
output_file Path | None

The path to the output file, if passed.

required

filter_files_on_chain(file2chains, output_dir, out_chain='A', scheduler_address=None, copy_method='copy')

Filter mmcif/PDB files by chain.

Parameters:

Name Type Description Default
file2chains Collection[tuple[Path, str]]

Which chain to keep for each PDB file. First item is the PDB file path, second item is the chain ID.

required
output_dir Path

The directory where the filtered files will be written.

required
out_chain str

Under what name to write the kept chain.

'A'
scheduler_address str | Cluster | Literal['sequential'] | None

The address of the Dask scheduler. If not provided, will create a local cluster. If set to sequential will run tasks sequentially.

None
copy_method CopyMethod

How to copy when a direct copy is possible.

'copy'

Returns:

Type Description
list[ChainFilterStatistics]

Result of the filtering process.

filter_files_on_residues(input_files, output_dir, min_residues, max_residues, chain='A', copy_method='copy')

Filter PDB/mmCIF files by number of residues in given chain.

Parameters:

Name Type Description Default
input_files list[Path]

The list of input PDB/mmCIF files.

required
output_dir Path

The directory where the filtered files will be written.

required
min_residues int

The minimum number of residues in chain.

required
max_residues int

The maximum number of residues in chain.

required
chain str

The chain to count residues of.

'A'
copy_method CopyMethod

How to copy passed files to output directory:

'copy'

Yields:

Type Description
Generator[ResidueFilterStatistics]

Objects containing information about the filtering process for each input file.