Analysis of a gromacs simulation
The simulation is running now (or finished) and we can start analysing the
results. Let us first see which kind of files have been written by the
simulation (mdrun):
ls -lrt
We see the following files:
- traj.xtc - the trajectory to be used for analyses
- traj.trr - the trajectory to be used for a restart in case of a crash
- ener.edr - energies
- md.log - a LOG file of mdrun
- md.gro - the final coordinates of the simulation
The first
analysis step during or after a simulation is usually a visual inspection of the
trajectory. For this we will use the gromacs trajectory viewer ngmx. Other
possibilities would be VMD ,
Pymol, or gOpenmol.
ngmx -s md.tpr -f traj.xtc
Select a group of your interest ("system"
might be good for a start), and then, under "Display", click "Animate". Now a
animation player button appears near the bottom of the screen, which can be used
to view the trajectory. If you wish to filter different simulated components,
choose a different group under "Display" and "Filter". We can see that the
protein and its surroundings undergo thermal fluctuations, but overall, the
protein structure is rather stable, as would be expected on such timescales. For
a more quantitative analysis on the protein fluctuations, we can view how fast
and how far the protein deviates from the starting (experimental) structure:
g_rms -s md.tpr -f traj.xtc
When prompted for groups to be analysed,
type "1 1 1". g_rms has written a file called "rmsd.xvg", which can be viewed
with:
xmgr rmsd.xvg
We see the Root Mean Square Deviation (rmsd) from the
starting structure, averaged over all protein atoms, as a function of time. The
deviation of 0.1 nm is within the expected range for a protein like the B1
domain at a temperature of 300K at such timescales. If we now wish to see if the
fluctuations are equally distributed over the protein, or if some residues are
more flexible than others, we can type:
g_rmsf -s md.tpr -f traj.xtc -oq
Select group "3" (C-alpha). The result
can be viewed with:
xmgr rmsf.xvg
We can see that mainly two regions in the protein show a
large flexibility: around residue 11 and residue 48. To see where these residues
are located in the protein, type:
rasmol bfac.pdb
Under "Colours", select "Temperature". The protein
backbone is now shown with the flexibility encoded in the colour. The red
(orange, green) regions are relatively flexible and the blue regions are
relatively rigid. It can be seen that the alpha-helix and beta-sheet are
relatively stable, whereas the loops are more flexible.
Repeat the same rasmol command for the x-ray structure (1PGB.pdb).
Question: How do the flexibilities in the
simulation compare to the temperature factors in the crystallographic
structure?
(note that the correspondence cannot be expected to be 100%, as
the protein flexibility in different in the crystal than in solution (as in the
simulation). Additionally, not only flexibility is encoded in the
crystallographic B-factor, but also experimental error or model inaccuracy)
The simulation not only yields information on the structural properties of
the simulation, but also on the energetics. With the program
g_energy the energies written by mdrun can be analysed:
g_energy -f ener.edr
Select "9" and end your selection with "0", View
the result with:
xmgr energy.xvg
As can be seen, the total potential energy initially
rises rapidly after which it relaxes again.
Question: can you think of an explanation for
this behaviour?
Please repeat the energy analysis for a number of different energy terms to
obtain an impression of their behaviour.
We continue with a number of more specific analysis, the first of which is an
analysis of the secondary structure (alpha-helix, beta-sheet) of the protein
during the simulation:
do_dssp -s md.tpr -f traj.xtc
select group "1" (protein), and convert
the output to PostScript with:
xpm2ps -f ss.xpm
and view the result with:
xpsview plot.eps
if "xpsview" is not installed on your computer, try
"gv", "ghostview" or "gs". As can be seen, the secondary structure is rather
stable during the simulation, which is an important validation check of the
simulation procedure (and force-field) used. The next thing to analyse is the
change in the overall size (or gyration radius) of the protein:
g_gyrate -s md.tpr -f traj.xtc
(again, select group "1" for the protein)
xmgr gyrate.xvg
The analysis shows that the gyration radius fluctuates
around a stable value and does not show any significant drift. Another important
check concerns the behaviour of the protein surface:
g_sas -s md.tpr -f traj.xtc
(again, select group "1" for the protein)
xmgr -nxy area.xvg
We now see three curves together: black for the
hydrophobic accessible surface, red for the hydrophilic accessible surface, and
green for the total accessible surface.
Question: Is the total (solvent-accessible)
surface constant? Are any hydrophobic groups exposed during the simulation?
OPTIONAL
If time allows, continue here with an "Essential
Dynamics" or "Principal Components" analysis of the protein motions.
The analyses so far concentrated mostly on global dynamical properties of the
protein. In the animation, we saw that the dynamics of the protein are rather
complex and for the researcher it is not trivial to grasp the types of motion
that occur. Therefore, it is useful to split the dynamics into different modes
of motion and analyse these modes individually. In principle, in a system of N
atoms, 3N-6 of such modes of internal fluctuations exist (six degrees of freedom
are required to describe the external rotation and translation of the system).
Since our protein contains 855 atoms, more than 2,500 of such modes exist.
Therefore, such a splitting into individual modes is only meaningful if the
fluctuations are not equally spread over all the modes, but rather, when there
are a few modes where most fluctuations are concentrated, and many that hardly
contribute to the fluctuations.
Fortunately, a method exists to derive exactly those modes of motion that
contribute most to the overall dynamics. This method is called "Essential
Dynamics" or "Principal Components" analysis. The theory behind the method will
be explained in a later lecture, here we concentrate on how the analysis works
in practice. The first part of the analysis is started with:
g_covar -s md.tpr -f traj.xtc
Select group "4" (protein backbone) both
for the fit and for the analysis. We have now optimised the modes such that the
total amount of fluctuation is concentrated as well as possible in a small
number of modes, whereas the majority of modes hardly contribute to the total
fluctuation. To see the division of the mean square fluctuations over the modes:
xmgr eigenval.xvg
As can be seen, indeed only a few modes of collective
fluctuation contribute significantly to the total fluctuation! To view what to
what type of fluctuation such modes correspond, we can filter the trajectory
along for example the first (most dominant) and visualise the result:
g_anaeig -extr extr_ev1.pdb -first 1 -last 1 -nframes 10 -s md.tpr
rasmol extr_ev1.pdb
This mode (or first eigenvector) thus corresponds
to a collective motion containing contributions of all atoms. To see a dynamic
animation of the main collective mode from a longer simulation, click here . To appreciate the contrast, here is the raw, unfiltered trajectory. The
collective mode with the second-highest fluctuation can be found here .
If we are interested in the time evolution of these collective coordinates
during the simulation, we can project the trajectory onto these coordinates:
g_anaeig -proj -s md.tpr -f traj.xtc
xmgr proj.xvg
showing that the first mode is not only the one with the
largest (mean) amplitude, but also that most others tend to describe
fluctuations with higher frequencies.
OK, that was all. Hope you enjoyed it! Please address any comments or
questions to:
Bert de Groot
bgroot@gwdg.de