Skip to content

io

ProteinPdbRow dataclass

Info about PDB entry and its relation to an Uniprot entry

Parameters:

Name Type Description Default
id str

The PDB ID of the entry.

required
uniprot_chains str

The UniProt chains associated with the PDB entry.

required
uniprot_acc str

The UniProt accession number associated with the PDB entry.

required
mmcif_file Path | None

The path to the mmCIF file for the PDB entry, or None if not retrieved yet.

required

SingleChainQuery dataclass

Query for writing single chain PDB files.

Parameters:

Name Type Description Default
min_residues int

Minimum number of residues that must be in chain.

required
max_residues int

Maximum number of residues that must be in chain.

required

SingleChainResult dataclass

Result of writing a single chain PDB file.

Parameters:

Name Type Description Default
uniprot_acc str

The UniProt accession.

required
pdb_id str

The PDB ID of the entry.

required
output_file Path | None

The path to the output PDB file with just the first chain (renamed to A) belonging to given Uniprot accession. Only set when passed is True.

required
nr_residues int

The number of residues in the chain that was written.

required
passed bool

Whether the chain passed the number of residue filter.

required

filter_and_write_single_chain_pdb_file(mmcif_file, chain2keep, output_file, min_residues, max_residues, out_chain='A')

Saves a specific protein chain from a mmCIF file to a new PDB file.

Parameters:

Name Type Description Default
mmcif_file Path | str

Path to the input mmCIF file.

required
chain2keep str

Chain to keep.

required
output_file Path | str

Path to the output PDB file.

required
min_residues int

Minimum number of residues in the chain to write.

required
max_residues int

Maximum number of residues in the chain to write.

required
out_chain str

Chain identifier for the saved chain in the output file.

'A'

Returns:

Type Description
bool

A tuple containing a boolean indicating whether

int

chain is in the residue range and the file was written successfully,

tuple[bool, int]

and the number of residues in the chain.

first_chain_from_uniprot_chains(uniprot_chains)

Extracts the first chain identifier from a UniProt chains string.

The UniProt chains string is formatted (with EBNF notation) as follows:

chain_group(=range)?(,chain_group(=range)?)*
where

chain_group := chain_id(/chain_id)* chain_id := [A-Za-z]+ range := start-end start, end := integer

Parameters:

Name Type Description Default
uniprot_chains str

A string representing UniProt chains, For example "B/D=1-81".

required

Returns: The first chain identifier from the UniProt chain string. For example "B".

nr_residues_in_chain(file, chain='A')

Returns the number of residues in a specific chain from a mmCIF/pdb file.

Parameters:

Name Type Description Default
file Path | str

Path to the input mmCIF/pdb file.

required
chain str

Chain to keep.

'A'

Returns:

Type Description
int

The number of residues in the specified chain.

write_single_chain_pdb_file(proteinpdb, session_dir, single_chain_dir, query)

Process a ProteinPdbRow to write a single chain PDB file if possible, returning the result.

Parameters:

Name Type Description Default
proteinpdb ProteinPdbRow

A ProteinPdbRow object.

required
session_dir Path

The directory where the session files are stored.

required
single_chain_dir Path

The directory where the single chain PDB files will be saved.

required
query SingleChainQuery

The query containing the minimum and maximum number of residues.

required

Returns:

Type Description
SingleChainResult

Result object containing the output file path and whether the chain passed the residue filter.

write_single_chain_pdb_files(proteinpdbs, session_dir, single_chain_dir, query)

Writes single chain PDB files from the provided protein PDB rows.

Parameters:

Name Type Description Default
proteinpdbs list[ProteinPdbRow]

A list of ProteinPdbRow objects.

required
session_dir Path

The directory where the session files are stored.

required
single_chain_dir Path

The directory where the single chain PDB files will be saved.

required
query SingleChainQuery

The query containing the minimum and maximum number of residues.

required

Yields:

Type Description
Generator[SingleChainResult]

SingleChainResult objects containing the output file path and whether the chain passed the residue filter.