io
ProteinPdbRow
dataclass
Info about PDB entry and its relation to an Uniprot entry
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The PDB ID of the entry. |
required |
uniprot_chains
|
str
|
The UniProt chains associated with the PDB entry. |
required |
uniprot_acc
|
str
|
The UniProt accession number associated with the PDB entry. |
required |
mmcif_file
|
Path | None
|
The path to the mmCIF file for the PDB entry, or None if not retrieved yet. |
required |
SingleChainQuery
dataclass
SingleChainResult
dataclass
Result of writing a single chain PDB file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uniprot_acc
|
str
|
The UniProt accession. |
required |
pdb_id
|
str
|
The PDB ID of the entry. |
required |
output_file
|
Path | None
|
The path to the output PDB file with just the first chain (renamed to A) belonging to given Uniprot accession. Only set when passed is True. |
required |
nr_residues
|
int
|
The number of residues in the chain that was written. |
required |
passed
|
bool
|
Whether the chain passed the number of residue filter. |
required |
filter_and_write_single_chain_pdb_file(mmcif_file, chain2keep, output_file, min_residues, max_residues, out_chain='A')
Saves a specific protein chain from a mmCIF file to a new PDB file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mmcif_file
|
Path | str
|
Path to the input mmCIF file. |
required |
chain2keep
|
str
|
Chain to keep. |
required |
output_file
|
Path | str
|
Path to the output PDB file. |
required |
min_residues
|
int
|
Minimum number of residues in the chain to write. |
required |
max_residues
|
int
|
Maximum number of residues in the chain to write. |
required |
out_chain
|
str
|
Chain identifier for the saved chain in the output file. |
'A'
|
Returns:
Type | Description |
---|---|
bool
|
A tuple containing a boolean indicating whether |
int
|
chain is in the residue range and the file was written successfully, |
tuple[bool, int]
|
and the number of residues in the chain. |
first_chain_from_uniprot_chains(uniprot_chains)
Extracts the first chain identifier from a UniProt chains string.
The UniProt chains string is formatted (with EBNF notation) as follows:
chain_group(=range)?(,chain_group(=range)?)*
where
chain_group := chain_id(/chain_id)* chain_id := [A-Za-z]+ range := start-end start, end := integer
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uniprot_chains
|
str
|
A string representing UniProt chains, For example "B/D=1-81". |
required |
Returns: The first chain identifier from the UniProt chain string. For example "B".
nr_residues_in_chain(file, chain='A')
write_single_chain_pdb_file(proteinpdb, session_dir, single_chain_dir, query)
Process a ProteinPdbRow to write a single chain PDB file if possible, returning the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
proteinpdb
|
ProteinPdbRow
|
A ProteinPdbRow object. |
required |
session_dir
|
Path
|
The directory where the session files are stored. |
required |
single_chain_dir
|
Path
|
The directory where the single chain PDB files will be saved. |
required |
query
|
SingleChainQuery
|
The query containing the minimum and maximum number of residues. |
required |
Returns:
Type | Description |
---|---|
SingleChainResult
|
Result object containing the output file path and whether the chain passed the residue filter. |
write_single_chain_pdb_files(proteinpdbs, session_dir, single_chain_dir, query)
Writes single chain PDB files from the provided protein PDB rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
proteinpdbs
|
list[ProteinPdbRow]
|
A list of ProteinPdbRow objects. |
required |
session_dir
|
Path
|
The directory where the session files are stored. |
required |
single_chain_dir
|
Path
|
The directory where the single chain PDB files will be saved. |
required |
query
|
SingleChainQuery
|
The query containing the minimum and maximum number of residues. |
required |
Yields:
Type | Description |
---|---|
Generator[SingleChainResult]
|
SingleChainResult objects containing the output file path and whether the chain passed the residue filter. |