Analysis of a gromacs simulation

The simulation is running now (or finished) and we can start analysing the results. Let us first see which kind of files have been written by the simulation (mdrun):

ls -lrt
We see the following files: The first analysis step during or after a simulation is usually a visual inspection of the trajectory. For this we will use the gromacs trajectory viewer ngmx. Other possibilities would be VMD , Pymol, or gOpenmol.

ngmx -s md.tpr -f traj.xtc
Select a group of your interest ("system" might be good for a start), and then, under "Display", click "Animate". Now a animation player button appears near the bottom of the screen, which can be used to view the trajectory. If you wish to filter different simulated components, choose a different group under "Display" and "Filter". We can see that the protein and its surroundings undergo thermal fluctuations, but overall, the protein structure is rather stable, as would be expected on such timescales. For a more quantitative analysis on the protein fluctuations, we can view how fast and how far the protein deviates from the starting (experimental) structure:

g_rms -s md.tpr -f traj.xtc
When prompted for groups to be analysed, type "1 1 1". g_rms has written a file called "rmsd.xvg", which can be viewed with:

xmgr rmsd.xvg
We see the Root Mean Square Deviation (rmsd) from the starting structure, averaged over all protein atoms, as a function of time. The deviation of 0.1 nm is within the expected range for a protein like the B1 domain at a temperature of 300K at such timescales. If we now wish to see if the fluctuations are equally distributed over the protein, or if some residues are more flexible than others, we can type:

g_rmsf -s md.tpr -f traj.xtc -oq
Select group "3" (C-alpha). The result can be viewed with:
xmgr rmsf.xvg
We can see that mainly two regions in the protein show a large flexibility: around residue 11 and residue 48. To see where these residues are located in the protein, type:

rasmol bfac.pdb
Under "Colours", select "Temperature". The protein backbone is now shown with the flexibility encoded in the colour. The red (orange, green) regions are relatively flexible and the blue regions are relatively rigid. It can be seen that the alpha-helix and beta-sheet are relatively stable, whereas the loops are more flexible.

Repeat the same rasmol command for the x-ray structure (1PGB.pdb).

Question: How do the flexibilities in the simulation compare to the temperature factors in the crystallographic structure?
(note that the correspondence cannot be expected to be 100%, as the protein flexibility in different in the crystal than in solution (as in the simulation). Additionally, not only flexibility is encoded in the crystallographic B-factor, but also experimental error or model inaccuracy)

The simulation not only yields information on the structural properties of the simulation, but also on the energetics. With the program g_energy the energies written by mdrun can be analysed:

g_energy -f ener.edr
Select "9" and end your selection with "0", View the result with:

xmgr energy.xvg
As can be seen, the total potential energy initially rises rapidly after which it relaxes again.

Question: can you think of an explanation for this behaviour?

Please repeat the energy analysis for a number of different energy terms to obtain an impression of their behaviour.

We continue with a number of more specific analysis, the first of which is an analysis of the secondary structure (alpha-helix, beta-sheet) of the protein during the simulation:

do_dssp -s md.tpr -f traj.xtc
select group "1" (protein), and convert the output to PostScript with:
xpm2ps -f ss.xpm
and view the result with:
xpsview plot.eps
if "xpsview" is not installed on your computer, try "gv", "ghostview" or "gs". As can be seen, the secondary structure is rather stable during the simulation, which is an important validation check of the simulation procedure (and force-field) used. The next thing to analyse is the change in the overall size (or gyration radius) of the protein:

g_gyrate -s md.tpr -f traj.xtc
(again, select group "1" for the protein)
xmgr gyrate.xvg
The analysis shows that the gyration radius fluctuates around a stable value and does not show any significant drift. Another important check concerns the behaviour of the protein surface:
g_sas -s md.tpr -f traj.xtc
(again, select group "1" for the protein)
xmgr -nxy area.xvg
We now see three curves together: black for the hydrophobic accessible surface, red for the hydrophilic accessible surface, and green for the total accessible surface.

Question: Is the total (solvent-accessible) surface constant? Are any hydrophobic groups exposed during the simulation?



OPTIONAL

If time allows, continue here with an "Essential Dynamics" or "Principal Components" analysis of the protein motions.

The analyses so far concentrated mostly on global dynamical properties of the protein. In the animation, we saw that the dynamics of the protein are rather complex and for the researcher it is not trivial to grasp the types of motion that occur. Therefore, it is useful to split the dynamics into different modes of motion and analyse these modes individually. In principle, in a system of N atoms, 3N-6 of such modes of internal fluctuations exist (six degrees of freedom are required to describe the external rotation and translation of the system). Since our protein contains 855 atoms, more than 2,500 of such modes exist. Therefore, such a splitting into individual modes is only meaningful if the fluctuations are not equally spread over all the modes, but rather, when there are a few modes where most fluctuations are concentrated, and many that hardly contribute to the fluctuations.

Fortunately, a method exists to derive exactly those modes of motion that contribute most to the overall dynamics. This method is called "Essential Dynamics" or "Principal Components" analysis. The theory behind the method will be explained in a later lecture, here we concentrate on how the analysis works in practice. The first part of the analysis is started with:

g_covar -s md.tpr -f traj.xtc
Select group "4" (protein backbone) both for the fit and for the analysis. We have now optimised the modes such that the total amount of fluctuation is concentrated as well as possible in a small number of modes, whereas the majority of modes hardly contribute to the total fluctuation. To see the division of the mean square fluctuations over the modes:
xmgr eigenval.xvg
As can be seen, indeed only a few modes of collective fluctuation contribute significantly to the total fluctuation! To view what to what type of fluctuation such modes correspond, we can filter the trajectory along for example the first (most dominant) and visualise the result:
g_anaeig -extr extr_ev1.pdb -first 1 -last 1 -nframes 10 -s md.tpr
rasmol extr_ev1.pdb
This mode (or first eigenvector) thus corresponds to a collective motion containing contributions of all atoms. To see a dynamic animation of the main collective mode from a longer simulation, click here . To appreciate the contrast, here is the raw, unfiltered trajectory. The collective mode with the second-highest fluctuation can be found here .

If we are interested in the time evolution of these collective coordinates during the simulation, we can project the trajectory onto these coordinates:

g_anaeig -proj -s md.tpr -f traj.xtc
xmgr proj.xvg
showing that the first mode is not only the one with the largest (mean) amplitude, but also that most others tend to describe fluctuations with higher frequencies.


OK, that was all. Hope you enjoyed it! Please address any comments or questions to:

Bert de Groot
bgroot@gwdg.de