Computational Structural Biology group focusing on dissecting, understanding and predicting biomolecular interactions at the molecular level.

Supported by:

# PowerFit Tutorial

This tutorial consists of the following sections:

## Introduction

PowerFit is a software application developed to fit atomic resolution structures of biomolecules to cryo-electron microscopy (cryo-EM) density maps. It is open-source and available for download on Github.

This tutorial will show you how to utilize PowerFit by applying it to an E.coli ribosome case. To follow this tutorial, you need, in addition to PowerFit, the UCSF Chimera visualization software, a popular tool in the cryo-electron microscopy community for its volume visualization capabilities. We will further discuss the limits of rigid body fitting, and how HADDOCK can alleviate some of the shortcomings. We provide the data necessary to run this tutorial here. If you are following one of our workshops, where we use a Virtual Machine, then all the required software and data should already be installed.

The PowerFit and HADDOCK software are described in

Throughout the tutorial, colored text will be used to refer to questions or instructions, Linux and/or Chimera commands.

The case we will be investigating is a complex between the 30S maturing E. coli ribosome and KsgA, a methyltransferase. There are models available for the E. coli ribosome and KsgA, and a cryo-EM density map of around 13Å resolution (EMD-2017).

## Setup

If you are using one of our pre-packed VM images, the data should be directly available in the image. We prepared a folder that contains the cryo-EM density map file in CCP4 format and the starting models of the ribosome and KsgA. The ribosome has already been properly fitted in the density.

In case you might run this tutorial on your own, make sure to have the required software installed (UCSF Chimera and PowerFit), and download the data to run this tutorial from our GitHub data repository here or clone it from the command line

## Inspecting the data

Let us first inspect the data we have available, namely the cryo-EM density map and the structures we will attempt to fit.

Using Chimera, we can easily visualize and inspect the density and models, mostly through a few mouse clicks.

In the Volume Viewer window, the middle slide bar provides control on the value at which the isosurface of the density is shown. At high values, the envelope will sink while lower values might even display the noise in the map. We will first make the density transparent, to see the fitted structure inside:

• Within the Volume Viewer click on the gray box next to Color, which opens the Color Editor window.
• In there, check the Opacity box. An extra slider bar appears in the box called A, for the alpha channel.
• Set the alpha channel value to around 0.6.

Notice that the density becomes transparent providing a better view of the fit of the ribosome model. On closer inspection, you can also discern a region of the density that is not accounted by the ribosome structure alone; this is the binding location of KsgA. Although you could try and manually place the crystal structure in that region, finding the correct orientation is not straightforward. PowerFit can help here as it attempts to find the best fit automatically and exhaustively, based on an objective score.

## Rigid body fitting

PowerFit is a rigid body fitting software that quickly calculates the cross-correlation, a common measure of the goodness-of-fit, between the atomic structure and the density map. It performs a systematic 6-dimensional scan of the three translational and three rotational degrees of freedom. In short, PowerFit will try to fit the structure in many orientations at every position on the map and calculate a cross-correlation score for each of them.

While performing the search, PowerFit will update you on the progress of the search. The example case in this tutorial should run in 10 minutes. If the ETA on your screen is substantially lower, your computer might be fast enough to allow an increase in the rotational sampling interval to 10°.

While the calculation is running, open a second terminal window (or tab) and type powerfit --help to have a look at the several features and options of PowerFit and what each flag of the previous command means.

PowerFit requires three arguments: a high-resolution atomic structure of the biomolecule to be fitted (KsgA.pdb), a target cryo-EM density map to fit the structure in (ribosome-KsgA.map), and the resolution, in ångstrom, of the density map (13).

The -a (or --angle) option specifies the rotational sampling interval in degrees, i.e. how tightly the three rotational degrees of freedom will be sampled. Lower values will cause PowerFit to perform a finer search, at the expense of computational time. The default value is 10°, but it can be lowered to 5° for more sensitive searches, or raised to 20° if time is an issue or if there aren’t sufficient computational resources. For the sake of time in this tutorial, we set the sampling interval to this latter coarser value. The -d option specifies where the results will be stored while the -p option specifies the number of processors that PowerFit can use during the search, to leverage available CPU resources.

Finally, the -l flag applies a Laplace pre-filter on the density data, which increases the cross-correlation sensitivity by enhancing edges in the density. In this example scenario, all other options are left at their default values but feel free to explore them.

## Analyzing the results

After the search, PowerFit creates a run-KsgA directory containing the following files:

• fit_N.pdb: the best N fits, judged by the cross-correlation score.
• solutions.out: all the non-redundant solutions found, ordered by their correlation score. The first column shows the rank, column 2 the correlation score, column 3 and 4 the Fisher z-score and the number of standard deviations; column 5 to 7 are the x, y and z coordinate of the center of the chain; column 8 to 17 are the rotation matrix values.
• lcc.mrc: a cross-correlation map showing, at each grid position, the highest cross-correlation score found during the search, thus showing the most likely location of the center of mass of the structure.
• powerfit.log: a log file of the calculation, including the input parameters with date and timing information.

Make the density map transparent again, by adjusting the alpha channel value to 0.6. The values of the lcc.mrc slider bar correspond to the cross-correlation score found. In this way, you can selectively visualize regions of high or low cross-correlation values: i.e., pushing the slider to the right (higher cutoff) shows only regions of the grid with high cross-correlation scores.

As you can see, PowerFit found quite some local optima, one of which stands out (if the rotational search was tight enough). Further, the 10 best-ranked solutions are centered on regions corresponding to local cross-correlation maxima.

To view each fitted solution individually, in the main panel, go to FavoritesModel Panel to open the Model Panel window. The window shows each model and its associated color that Chimera has processed. To show or hide a specific model you can click the box in the S column.

You now have combined the ribosome structure with the rigid-body fit of KsgA calculated by PowerFit, yielding an initial model of the complex. Mutagenesis experiments performed on this complex indicate three charged residues of KsgA - R221, R222, and K223 - that are of special importance for the interaction.

In the same session of Chimera where you have your chosen fitted KsgA structure, go to FavoritesCommand Line. A command line is now present below the main viewing window. In the command line of Chimera, type the following instructions to center your view on these residues and highlight their interactions:

Chimera also includes a tool to locally optimize the fit of a rigid structure against a given density map, which can be an additional help on top of the PowerFit calculations. Make the main display window active by clicking on it, then go to ToolsVolume dataFit in Map. In the newly opened Fit in Map window, select the best-fitted structure of PowerFit (fit_?.pdb) as Fit model and the original density map (ribosome-KsgA.map) as the map. Press Fit to start the optimization.

Does the Chimera local fit optimization tool improve the results of PowerFit?

The scoring function used by Chimera to estimate the quality of the fit makes our model worse, increasing the number of clashes between the ribosomal RNA and KsgA. Click Undo in the Fit in Map window to undo the optimization.

Next, we will try to optimize the fit using the cross-correlation that Chimera provides. Click Options and check the Use map simulated from atoms, resolution box and fill in 13 for resolution. Check the correlation radio button and uncheck the Use only data above contour level from first map. Press Fit.

Does this second strategy improve the quality of the fit? If not, undo it again.

The obvious limitation of rigid-body fitting is that it cannot account for any conformational changes the structures might undergo. Further, the low resolution of this particular density map does not allow the identification of side-chain atoms. The quality of the fitted models by PowerFit is, therefore, limited.

Given the availability of both the cryo-EM density map and of the mutagenesis experiments, we can integrate both in HADDOCK and benefit of its semi-flexible refinement protocols to improve the stereochemistry of our model. To use cryo-EM data, HADDOCK requires the map and also the approximate positions of each chain, as given by their centers of mass. This information is provided directly by PowerFit, in the solutions.out file, columns 5 to 7 (x, y, z coordinates):

Unfortunately, running HADDOCK is out of the scope of this tutorial as it requires a significant amount of time. Therefore, we provide the best-ranked HADDOCK model, generated by combining the cryo-EM map, the PowerFit centroid positions, and the mutagenesis data, in the tutorial data folder.

Finally, to make the impact of HADDOCK more quantitative, we will make a distance histogram of the contacts between the ribosome and KsgA. First, combine the ribosome together with your preferred fitted model.

cat ribosome.pdb run-KsgA/fit_?.pdb > ribosome-KsgA.pdb

To calculate all the contacts within a 5.0Å cutoff distances, we make use of a standard tool (contact-chainID) that is shipped with HADDOCK.

./contact-chainID ribosome-KsgA.pdb 5.0 > ribosome-KsgA.contacts

Now we can generate the histogram, and visualize it with xmgrace

For the HADDOCK model we already combined the ribosome and KsgA (HADDOCK-ribosome-KsgA.pdb).

The combination of cryo-EM and mutagenesis data, a physics-based force field, and a semi-flexible refinement protocol improves the quality of the resulting models. In this tutorial, we showed you how to use PowerFit to fit high-resolution structures to a cryo-EM density map and how to interpret the results. Further, we also showed how integrative modeling using HADDOCK can improve the stereochemistry of the models, in particular if done in combination with additional experimental data, such as mutagenesis.

Thank you for following this tutorial. If you have any questions or suggestions, feel free to contact us via email or by submitting an issue in the appropriate Github repository.