Skip to content

io

ProteinPdbRow dataclass

Info about PDB entry and its relation to an Uniprot entry

Parameters:

Name Type Description Default
id str

The PDB ID of the entry.

required
uniprot_chains str

The UniProt chains associated with the PDB entry.

required
uniprot_acc str

The UniProt accession number associated with the PDB entry.

required
mmcif_file Path | None

The path to the mmCIF file for the PDB entry, or None if not retrieved yet.

required

SingleChainResult dataclass

Result of writing a single chain PDB file.

Parameters:

Name Type Description Default
uniprot_acc str

The UniProt accession.

required
pdb_id str

The PDB ID of the entry.

required
output_file Path

The path to the output PDB file with just the first chain (renamed to A) belonging to given Uniprot accession.

required

first_chain_from_uniprot_chains(uniprot_chains)

Extracts the first chain identifier from a UniProt chains string.

The UniProt chains string is formatted (with EBNF notation) as follows:

chain_group(=range)?(,chain_group(=range)?)*
where

chain_group := chain_id(/chain_id)* chain_id := [A-Za-z]+ range := start-end start, end := integer

Parameters:

Name Type Description Default
uniprot_chains str

A string representing UniProt chains, For example "B/D=1-81".

required

Returns: The first chain identifier from the UniProt chain string. For example "B".

write_single_chain_pdb_file(mmcif_file, chain2keep, output_file, out_chain='A')

Saves a specific protein chain from a mmCIF file to a new PDB file.

Parameters:

Name Type Description Default
mmcif_file Path | str

Path to the input mmCIF file.

required
chain2keep str

Chain to keep.

required
output_file Path | str

Path to the output PDB file.

required
out_chain str

Chain identifier for the saved chain in the output file..

'A'

write_single_chain_pdb_files(proteinpdbs, session_dir, single_chain_dir)

Writes single chain PDB files from the provided protein PDB rows.

Parameters:

Name Type Description Default
proteinpdbs list[ProteinPdbRow]

A list of ProteinPdbRow objects.

required
session_dir Path

The directory where the session files are stored.

required
single_chain_dir Path

The directory where the single chain PDB files will be saved.

required

Yields:

Type Description
Generator[SingleChainResult]

SingleChainResult objects containing the UniProt accession, PDB ID, and output file path.