pdb-tools

Logo

A swiss army knife for editing PDB files.

View the Project on GitHub haddocking/pdb-tools

A swiss army knife for manipulating and editing PDB files.

Installation Instructions

pdb-tools are available on PyPi and can be installed though pip. This is the recommended way as it makes updating/uninstalling rather simple:

pip install pdb-tools

Because we use semantic versioning in the development of pdb-tools, every bugfix or new feature results in a new version of the software that is automatically published on PyPI. As such, there is no difference between the code on github and the latest version you can install with pip. To update your installation to the latest version of the code run:

pip install --upgrade pdb-tools

What can I do with them?

The purpose of each tool should be obvious from its name. In any case, here is a list of all the tools in the suite and their function. All tools share the same command-line interface. Below are a couple of examples to get you started. If you want to check out more examples of how to use the tools and their applications, or have any cool examples of your own, check out the cookbook.

Note: On Windows the tools will have the .exe extension.

What can’t I do with them?

Operations that involve coordinates or numerical calculations are usually not in the scope of pdb-tools. Use a proper library for that, it will be much faster and scalable. Also, although we provide mmCIF<->PDB converters, we do not support large mmCIF files with more than 99999 atoms, or 9999 residues in a single chain. Our tools will complain if you try using them on such a molecule.

About

Manipulating PDB files is often painful. Extracting a particular chain or set of residues, renumbering residues, splitting or merging models and chains, or just ensuring the file is conforming to the PDB specifications are examples of tasks that can be done using any decent parsing library or graphical interface. These, however, almost always require 1) scripting knowledge, 2) time, and 3) installing one or more programs.

pdb-tools were designed to be a swiss army knife for the PDB format. The philosophy of the scripts is simple: one script, one task. If you want to do two things, pipe the scripts together. Requests for new scripts will be taken into consideration - use the Issues button or write them yourself and create a Pull Request.

Looking for the other pdb-tools?

The Harms lab maintains a set of tools also called pdbtools, which perform a slightly different set of functions. You can find them here.

Citation

We finally decided to write up a small publication describing the tools. If you used them in a project that is going to be published, please cite us:

Rodrigues JPGLM, Teixeira JMC, Trellet M and Bonvin AMJJ.
pdb-tools: a swiss army knife for molecular structures. 
F1000Research 2018, 7:1961 (https://doi.org/10.12688/f1000research.17456.1) 

If you use a reference manager that supports BibTex, use this record:

@Article{ 10.12688/f1000research.17456.1,
AUTHOR = { Rodrigues, JPGLM and Teixeira, JMC and Trellet, M and Bonvin, AMJJ},
TITLE = {pdb-tools: a swiss army knife for molecular structures [version 1; peer review: 2 approved]
},
JOURNAL = {F1000Research},
VOLUME = {7},
YEAR = {2018},
NUMBER = {1961},
DOI = {10.12688/f1000research.17456.1}
}

Requirements

pdb-tools should run on Python 2.7+ and Python 3.x. We test on Python 2.7, 3.6, and 3.7. There are no dependencies.

Installing from Source

Download the zip archive or clone the repository with git. We recommend the git approach since it makes updating the tools extremely simple.

# To download
git clone https://github.com/haddocking/pdb-tools
cd pdb-tools

# To update
git pull origin master

# To install
python setup.py install

Contributing

If you want to contribute to the development of pdb-tools, provide a bug fix, or a new tools, read our CONTRIBUTING instructions here.

License

pdb-tools are open-source and licensed under the Apache License, version 2.0. For details, see the LICENSE file.

List of Tools

pdb_b

Modifies the temperature factor column of a PDB file (default 10.0).

Usage: python pdb_b.py -<bfactor> <pdb file> Example: python pdb_b.py -10.0 1CTF.pdb
pdb_chain

Modifies the chain identifier column of a PDB file (default is an empty chain).

Usage: python pdb_chain.py -<chain id> <pdb file> Example: python pdb_chain.py -C 1CTF.pdb
pdb_chainbows

Renames chain identifiers sequentially, based on TER records.

Since HETATM records are not separated by TER records and usually come together at the end of the PDB file, this script will attempt to reassign their chain identifiers based on the changes it made to ATOM lines. This might lead to bad output in certain corner cases. Usage: python pdb_chainbows.py <pdb file> Example: python pdb_chainbows.py 1CTF.pdb
pdb_chainxseg

Swaps the segment identifier for the chain identifier.

Usage: python pdb_chainxseg.py <pdb file> Example: python pdb_chainxseg.py 1CTF.pdb
pdb_chkensemble

Checks all models in a multi-model PDB file have the same composition.

Composition is defined as same atoms/residues/chains. Usage: python pdb_chkensemble.py <pdb file> Example: python pdb_chkensemble.py 1CTF.pdb
pdb_delchain

Deletes all atoms matching specific chains in the PDB file.

Usage: python pdb_delchain.py -<option> <pdb file> Example: python pdb_delchain.py -A 1CTF.pdb # removes chain A from PDB file python pdb_delchain.py -A,B 1CTF.pdb # removes chains A and B from PDB file
pdb_delelem

Deletes all atoms matching the given element in the PDB file.

Elements are read from the element column. Usage: python pdb_delelem.py -<option> <pdb file> Example: python pdb_delelem.py -H 1CTF.pdb # deletes all protons python pdb_delelem.py -N 1CTF.pdb # deletes all nitrogens python pdb_delelem.py -H,N 1CTF.pdb # deletes all protons and nitrogens
pdb_delhetatm

Removes all HETATM records in the PDB file.

Usage: python pdb_delhetatm.py <pdb file> Example: python pdb_delhetatm.py 1CTF.pdb
pdb_delinsertion

Deletes insertion codes in a PDB file.

Deleting an insertion code shifts the residue numbering of downstream residues. Allows for picking specific residues to delete insertion codes for. Usage: python pdb_delinsertion.py [-<option>] <pdb file> Example: python pdb_delinsertion.py 1CTF.pdb # delete ALL insertion codes python pdb_delinsertion.py -A9,B12 1CTF.pdb # deletes ins. codes for res # 9 of chain A and 12 of chain B.
pdb_delres

Deletes a range of residues from a PDB file.

The range option has three components: start, end, and step. Start and end are optional and if ommitted the range will start at the first residue or end at the last, respectively. The step option can only be used if both start and end are provided. Note that the start and end values of the range are purely numerical, while the range actually refers to every N-th residue, regardless of their sequence number. Usage: python pdb_delres.py -[resid]:[resid]:[step] <pdb file> Example: python pdb_delres.py -1:10 1CTF.pdb # Deletes residues 1 to 10 python pdb_delres.py -1: 1CTF.pdb # Deletes residues 1 to END python pdb_delres.py -:5 1CTF.pdb # Deletes residues from START to 5. python pdb_delres.py -::5 1CTF.pdb # Deletes every 5th residue python pdb_delres.py -1:10:5 1CTF.pdb # Deletes every 5th residue from 1 to 10
pdb_delresname

Removes all residues matching the given name in the PDB file.

Residues names are matched *without* taking into consideration spaces. Usage: python pdb_delresname.py -<option> <pdb file> Example: python pdb_delresname.py -ALA 1CTF.pdb # removes only Alanines python pdb_delresname.py -ASP,GLU 1CTF.pdb # removes (-) charged residues
pdb_element

Assigns the elements in the PDB file from atom names.

Usage: python pdb_element.py <pdb file> Example: python pdb_element.py 1CTF.pdb
pdb_fetch

Downloads a structure in PDB format from the RCSB website.

Allows downloading the (first) biological structure if selected. Usage: python pdb_fetch.py [-biounit] <pdb code> Example: python pdb_fetch.py 1brs # downloads unit cell, all 6 chains python pdb_fetch.py -biounit 1brs # downloads biounit, 2 chains
pdb_fixinsert

Fixes insertion codes in a PDB file.

Works by deleting an insertion code and shifting the residue numbering of downstream residues. Allows for picking specific residues to delete insertion codes for. Usage: python pdb_fixinsert.py [-<option>] <pdb file> Example: python pdb_fixinsert.py 1CTF.pdb # delete ALL insertion codes python pdb_fixinsert.py -A9,B12 1CTF.pdb # deletes ins. codes for res # 9 of chain A and 12 of chain B.
pdb_fromcif

Rudimentarily converts a mmCIF file to the PDB format.

Will not convert if the file does not 'fit' in PDB format, e.g. too many chains, residues, or atoms. Will convert only the coordinate section. Usage: python pdb_fromcif.py <pdb file> Example: python pdb_fromcif.py 1CTF.pdb
pdb_gap

Finds gaps between consecutive protein residues in the PDB.

Detects gaps both by a distance criterion or discontinuous residue numbering. Only applies to protein residues. Usage: python pdb_gap.py <pdb file> Example: python pdb_gap.py 1CTF.pdb
pdb_head

Returns the first N coordinate (ATOM/HETATM) lines of the file.

Usage: python pdb_head.py -<num> <pdb file> Example: python pdb_head.py -100 1CTF.pdb # first 100 ATOM/HETATM lines of the file
pdb_intersect

Returns a new PDB file only with atoms in common to all input PDB files.

Atoms are judged equal is their name, altloc, res. name, res. num, insertion code and chain fields are the same. Coordinates are taken from the first input file. Keeps matching TER/ANISOU records. Usage: python pdb_intersect.py <pdb file> <pdb file> Example: python pdb_intersect.py 1XYZ.pdb 1ABC.pdb
pdb_keepcoord

Removes all non-coordinate records from the file.

Keeps only MODEL, ENDMDL, END, ATOM, HETATM, CONECT. Usage: python pdb_keepcoord.py <pdb file> Example: python pdb_keepcoord.py 1CTF.pdb
pdb_merge

Merges several PDB files into one.

The contents are not sorted and no lines are deleted (e.g. END, TER statements) so we recommend piping the results through `pdb_tidy.py`. Usage: python pdb_merge.py <pdb file> <pdb file> Example: python pdb_merge.py 1ABC.pdb 1XYZ.pdb
pdb_mkensemble

Merges several PDB files into one multi-model (ensemble) file.

Strips all HEADER information and adds REMARK statements with the provenance of each conformer. Usage: python pdb_mkensemble.py <pdb file> <pdb file> Example: python pdb_mkensemble.py 1ABC.pdb 1XYZ.pdb
pdb_occ

Modifies the occupancy column of a PDB file (default 1.0).

Usage: python pdb_occ.py -<occupancy> <pdb file> Example: python pdb_occ.py -1.0 1CTF.pdb
pdb_reatom

Renumbers atom serials in the PDB file starting from a given value (default 1).

Usage: python pdb_reatom.py -<number> <pdb file> Example: python pdb_reatom.py -10 1CTF.pdb # renumbers from 10 python pdb_reatom.py --1 1CTF.pdb # renumbers from -1
pdb_reres

Renumbers the residues of the PDB file starting from a given number (default 1).

Usage: python pdb_reres.py -<number> <pdb file> Example: python pdb_reres.py -10 1CTF.pdb # renumbers from 10 python pdb_reres.py --1 1CTF.pdb # renumbers from -1
pdb_rplchain

Performs in-place replacement of a chain identifier by another.

Usage: python pdb_rplchain.py -<from>:<to> <pdb file> Example: python pdb_rplchain.py -A:B 1CTF.pdb # Replaces chain A for chain B
pdb_rplresname

Performs in-place replacement of a residue name by another.

Affects all residues with that name. Usage: python pdb_rplresname.py -<from>:<to> <pdb file> Example: python pdb_rplresname.py -HIP:HIS 1CTF.pdb # changes all HIP residues to HIS
pdb_seg

Modifies the segment identifier column of a PDB file (default is an empty segment).

Usage: python pdb_seg.py -<segment id> <pdb file> Example: python pdb_seg.py -C 1CTF.pdb
pdb_segxchain

Swaps the chain identifier by the segment identifier.

If the segment identifier is longer than one character, the script will truncate it. Does not ensure unique chain IDs. Usage: python pdb_segxchain.py <pdb file> Example: python pdb_segxchain.py 1CTF.pdb
pdb_selaltloc

Selects altloc labels for the entire PDB file.

By default, selects the label with the highest occupancy value for each atom, but the user can define a specific altloc label to select. Selecting by highest occupancy removes all altloc labels for all atoms. If the user provides an option (e.g. -A), only atoms with conformers with an altloc A are processed by the script. If you select -A and an atom has conformers with altlocs B and C, both B and C will be kept in the output. Usage: python pdb_selaltloc.py [-<option>] <pdb file> Example: python pdb_selaltloc.py 1CTF.pdb # picks locations with highest occupancy python pdb_selaltloc.py -A 1CTF.pdb # picks alternate locations labelled 'A'
pdb_selatom

Selects all atoms matching the given name in the PDB file.

Atom names are matched *without* taking into consideration spaces, so ' CA ' (alpha carbon) and 'CA ' (calcium) will both be kept if -CA is passed. Usage: python pdb_selatom.py -<option> <pdb file> Example: python pdb_selatom.py -CA 1CTF.pdb # keeps only alpha-carbon atoms python pdb_selatom.py -CA,C,N,O 1CTF.pdb # keeps only backbone atoms
pdb_selchain

Extracts one or more chains from a PDB file.

Usage: python pdb_selchain.py -<chain id> <pdb file> Example: python pdb_selchain.py -C 1CTF.pdb # selects chain C python pdb_selchain.py -A,C 1CTF.pdb # selects chains A and C
pdb_selelem

Selects all atoms that match the given element(s) in the PDB file.

Elements are read from the element column. Usage: python pdb_selelem.py -<option> <pdb file> Example: python pdb_selelem.py -H 1CTF.pdb # selects all protons python pdb_selelem.py -N 1CTF.pdb # selects all nitrogens python pdb_selelem.py -H,N 1CTF.pdb # selects all protons and nitrogens
pdb_selhetatm

Selects all HETATM records in the PDB file.

Usage: python pdb_selhetatm.py <pdb file> Example: python pdb_selhetatm.py 1CTF.pdb
pdb_selres

Selects residues by their index, piecewise or in a range.

The range option has three components: start, end, and step. Start and end are optional and if ommitted the range will start at the first residue or end at the last, respectively. Usage: python pdb_selres.py -[resid]:[resid]:[step] <pdb file> Example: python pdb_selres.py -1,2,4,6 1CTF.pdb # Extracts residues 1, 2, 4 and 6 python pdb_selres.py -1:10 1CTF.pdb # Extracts residues 1 to 10 python pdb_selres.py -1:10,20:30 1CTF.pdb # Extracts residues 1 to 10 and 20 to 30 python pdb_selres.py -1: 1CTF.pdb # Extracts residues 1 to END python pdb_selres.py -:5 1CTF.pdb # Extracts residues from START to 5. python pdb_selres.py -::5 1CTF.pdb # Extracts every 5th residue python pdb_selres.py -1:10:5 1CTF.pdb # Extracts every 5th residue from 1 to 10
pdb_selresname

Selects all residues matching the given name in the PDB file.

Residues names are matched *without* taking into consideration spaces. Usage: python pdb_selresname.py -<option> <pdb file> Example: python pdb_selresname.py -ALA 1CTF.pdb # keeps only Alanines python pdb_selresname.py -ASP,GLU 1CTF.pdb # keeps (-) charged residues
pdb_selseg

Selects all atoms matching the given segment identifier.

Usage: python pdb_selseg.py -<segment id> <pdb file> Example: python pdb_selseg.py -C 1CTF.pdb # selects segment C python pdb_selseg.py -C,D 1CTF.pdb # selects segments C and D
pdb_shiftres

Shifts the residue numbers in the PDB file by a constant value.

Usage: python pdb_shiftres.py -<number> <pdb file> Example: python pdb_shiftres.py -10 1CTF.pdb # adds 10 to the original numbering python pdb_shiftres.py --5 1CTF.pdb # subtracts 5 from the original numbering
pdb_sort

Sorts the ATOM/HETATM/ANISOU/CONECT records in a PDB file.

Atoms are always sorted by their serial number, meaning the original ordering of the atoms within each residue are not changed. Alternate locations are sorted by default. Residues are sorted according to their residue sequence number and then by their insertion code (if any). Chains are sorted by their chain identifier. Finally, the file is sorted by all keys, and the records are placed in the following order: - ATOM/ANISOU, intercalated if the latter exist - HETATM - CONECT, sorted by the serial number of the central (first) atom MASTER, TER, END statements are removed. Headers (HEADER, REMARK, etc) are kept and placed first. Does NOT support multi-model files. Use pdb_splitmodel, then pdb_sort on each model, and then pdb_mkensemble. Usage: python pdb_sort.py -<option> <pdb file> Example: python pdb_sort.py 1CTF.pdb # sorts by chain and residues python pdb_sort.py -C 1CTF.pdb # sorts by chain (A, B, C ...) only python pdb_sort.py -R 1CTF.pdb # sorts by residue number/icode only
pdb_splitchain

Splits a PDB file into several, each containing one chain.

Usage: python pdb_splitchain.py <pdb file> Example: python pdb_splitchain.py 1CTF.pdb
pdb_splitmodel

Splits a PDB file into several, each containing one MODEL.

Usage: python pdb_splitmodel.py <pdb file> Example: python pdb_splitmodel.py 1CTF.pdb
pdb_splitseg

Splits a PDB file into several, each containing one segment.

Usage: python pdb_splitseg.py <pdb file> Example: python pdb_splitseg.py 1CTF.pdb
pdb_tidy

Modifies the file to adhere (as much as possible) to the format specifications.

Expects a sorted file - REMARK/ATOM/HETATM/END - so use pdb_sort in case you are not sure. This includes: - Adding TER statements after chain breaks/changes - Truncating/Padding all lines to 80 characters - Adds END statement at the end of the file Will remove all original TER/END statements from the file. Usage: python pdb_tidy.py [-strict] <pdb file> Example: python pdb_tidy.py 1CTF.pdb python pdb_tidy.py -strict 1CTF.pdb # does not add TER on chain breaks
pdb_tocif

Rudimentarily converts the PDB file to mmCIF format.

Will convert only the coordinate section. Usage: python pdb_tocif.py <pdb file> Example: python pdb_tocif.py 1CTF.pdb
pdb_tofasta

Extracts the residue sequence in a PDB file to FASTA format.

Canonical amino acids and nucleotides are represented by their one-letter code while all others are represented by 'X'. The -multi option splits the different chains into different records in the FASTA file. Usage: python pdb_tofasta.py [-multi] <pdb file> Example: python pdb_tofasta.py 1CTF.pdb
pdb_uniqname

Renames atoms sequentially (C1, C2, O1, ...) for each HETATM residue.

Relies on an element column being present (see pdb_element). Usage: python pdb_uniqname.py <pdb file> Example: python pdb_uniqname.py 1CTF.pdb
pdb_validate

Validates the PDB file ATOM/HETATM lines according to the format specifications.

Does not catch all the errors though... people are creative! Usage: python pdb_validate.py <pdb file> Example: python pdb_validate.py 1CTF.pdb
pdb_wc

Summarizes the contents of a PDB file, like the wc command in UNIX.

By default, this tool produces a general summary, but you can use several options to produce focused but more detailed summaries: [m] - no. of models. [c] - no. of chains (plus per-model if multi-model file). [r] - no. of residues (plus per-model if multi-model file). [a] - no. of atoms (plus per-model if multi-model file). [h] - no. of HETATM (plus per-model if multi-model file). [o] - presence of disordered atoms (altloc). [i] - presence of insertion codes. Usage: python pdb_wc.py [-<option>] <pdb file> Example: python pdb_wc.py 1CTF.pdb