Skip to content

filters

FilterStat dataclass

Statistics for filtering files based on residue count in a specific chain.

Parameters:

Name Type Description Default
input_file Path

The path to the input file.

required
residue_count int

The number of residues.

required
passed bool

Whether the file passed the filtering criteria.

required
output_file Path | None

The path to the output file, if passed.

required

filter_files_on_chain(input_dir, id2chains, output_dir, scheduler_address=None, out_chain='A')

Filter mmcif/PDB files by chain.

Parameters:

Name Type Description Default
input_dir Path

The directory containing the input mmcif/PDB files.

required
id2chains dict[str, str]

Which chain to keep for each PDB ID. Key is the PDB ID, value is the chain ID.

required
output_dir Path

The directory where the filtered files will be written.

required
scheduler_address str | Cluster | None

The address of the Dask scheduler.

None
out_chain str

Under what name to write the kept chain.

'A'

Returns:

Type Description
list[tuple[str, str, Path | None]]

A list of tuples containing the PDB ID, chain ID, and path to the filtered file.

list[tuple[str, str, Path | None]]

Last tuple item is None if something went wrong like chain not present.

filter_files_on_residues(input_files, output_dir, min_residues, max_residues, chain='A')

Filter PDB/mmCIF files by number of residues in given chain.

Parameters:

Name Type Description Default
input_files list[Path]

The list of input PDB/mmCIF files.

required
output_dir Path

The directory where the filtered files will be written.

required
min_residues int

The minimum number of residues in chain.

required
max_residues int

The maximum number of residues in chain.

required
chain str

The chain to count residues of.

'A'

Yields:

Type Description
Generator[FilterStat]

FilterStat objects containing information about the filtering process for each input file.