Skip to content

io

Module for structure file input/output.

StructureFileExtensions = Literal['.pdb', '.pdb.gz', '.ent', '.ent.gz', '.cif', '.cif.gz', '.bcif', '.bcif.gz'] module-attribute

Type of supported structure file extensions.

valid_structure_file_extensions = set(get_args(StructureFileExtensions)) module-attribute

Set of valid structure file extensions.

bcif2cif(bcif_file)

Convert a binary CIF (bcif) file to a CIF string.

Parameters:

Name Type Description Default
bcif_file Path

Path to the binary CIF file.

required

Returns:

Type Description
str

A string containing the CIF representation of the structure.

bcif2structure(bcif_file)

Read a binary CIF (bcif) file and return a gemmi Structure object.

This is slower than other formats because gemmi does not support reading bcif files directly. So we convert it to a cif string first using mmcif package and then read the cif string using gemmi.

Parameters:

Name Type Description Default
bcif_file Path

Path to the binary CIF file.

required

Returns:

Type Description
Structure

A gemmi Structure object representing the structure in the bcif file.

bcifgz2structure(bcif_gz_file)

Read a binary CIF (bcif) gzipped file and return a gemmi Structure object.

This is slower than other formats because gemmi does not support reading bcif files directly. So we first gunzip the file to a temporary location, convert it to a cif string using mmcif package, and then read the cif string using gemmi.

Parameters:

Name Type Description Default
bcif_gz_file Path

Path to the binary CIF gzipped file.

required

Returns:

Type Description
Structure

A gemmi Structure object representing the structure in the bcif.gz file.

convert_to_cif_file(input_file, output_dir, copy_method)

Convert a single structure file to .cif format.

Parameters:

Name Type Description Default
input_file Path

The structure file to convert. See StructureFileExtensions for supported extensions.

required
output_dir Path

Directory to save the converted .cif file.

required
copy_method CopyMethod

How to copy when no changes are needed to output file.

required

Returns:

Type Description
Path

Path to the converted .cif file.

convert_to_cif_files(input_files, output_dir, copy_method)

Convert structure files to .cif format.

Parameters:

Name Type Description Default
input_files Iterable[Path]

Iterable of structure files to convert.

required
output_dir Path

Directory to save the converted .cif files.

required
copy_method CopyMethod

How to copy when no changes are needed to output file.

required

Yields:

Type Description
Generator[tuple[Path, Path]]

A tuple of the input file and the output file.

glob_structure_files(input_dir)

Glob for structure files in a directory.

Uses StructureFileExtensions as valid extensions. Does not search recursively.

Parameters:

Name Type Description Default
input_dir Path

The input directory to search for structure files.

required

Yields:

Type Description
Generator[Path]

Paths to the found structure files.

gunzip_file(gz_file, output_file=None, keep_original=True)

Unzip a .gz file.

Parameters:

Name Type Description Default
gz_file Path

Path to the .gz file.

required
output_file Path | None

Optional path to the output unzipped file. If None, the .gz suffix is removed from gz_file.

None
keep_original bool

Whether to keep the original .gz file. Default is True.

True

Returns:

Type Description
Path

Path to the unzipped file.

Raises:

Type Description
ValueError

If output_file is None and gz_file does not end with .gz.

locate_structure_file(root, pdb_id)

Locate a structure file for a given PDB ID in the specified directory.

Uses StructureFileExtensions as potential extensions. Also tries different casing of the PDB ID.

Parameters:

Name Type Description Default
root Path

The root directory to search in.

required
pdb_id str

The PDB ID to locate.

required

Returns:

Type Description
Path

The path to the located structure file.

Raises:

Type Description
FileNotFoundError

If no structure file is found for the given PDB ID.

read_structure(file)

Read a structure from a file.

Parameters:

Name Type Description Default
file Path

Path to the input structure file. See StructureFileExtensions for supported extensions.

required

Returns:

Type Description
Structure

A gemmi Structure object representing the structure in the file.

split_name_and_extension(name)

Split a filename into its name and extension.

.gz is considered part of the extension if present.

Examples:

Some example usages.

>>> from protein_quest.pdbe.io import split_name_and_extension
>>> split_name_and_extension("1234.pdb")
('1234', '.pdb')
>>> split_name_and_extension("1234.pdb.gz")
('1234', '.pdb.gz')

Parameters:

Name Type Description Default
name str

The filename to split.

required

Returns:

Type Description
tuple[str, str]

A tuple containing the name and the extension.

structure2bcif(structure, bcif_file)

Write a gemmi Structure object to a binary CIF (bcif) file.

This is slower than other formats because gemmi does not support writing bcif files directly. So we convert it to a cif string first using gemmi and then convert cif to bcif using mmcif package.

Parameters:

Name Type Description Default
structure Structure

The gemmi Structure object to write.

required
bcif_file Path

Path to the output binary CIF file.

required

structure2bcifgz(structure, bcif_gz_file)

Write a gemmi Structure object to a binary CIF gzipped (bcif.gz) file.

This is slower than other formats because gemmi does not support writing bcif files directly. So we convert it to a cif string first using gemmi and then convert cif to bcif using mmcif package. Finally, we gzip the bcif file.

Parameters:

Name Type Description Default
structure Structure

The gemmi Structure object to write.

required
bcif_gz_file Path

Path to the output binary CIF gzipped file.

required

write_structure(structure, path)

Write a gemmi structure to a file.

Parameters:

Name Type Description Default
structure Structure

The gemmi structure to write.

required
path Path

The file path to write the structure to. The format depends on the file extension. See StructureFileExtensions for supported extensions.

required

Raises:

Type Description
ValueError

If the file extension is not supported.