Phil Biggin phil@biop.ox.ac.uk
In the second we will start from a position where the simulation has actually been run and we need to analyse our trajectory. Actually we will look at two trajectories of two different membrane proteins. We will first look at some simple analysis to see whether or not our system is in equilibrium and from there look at some time-dependent properties.
The practical is both open-ended and in the analysis part is fairly independent of previous steps which means you can opt out of certain exercises if you wish. However I would recommend proceeding sequentially through all of it. Finally, I assume very little knowledge of gromacs and VMD so if you are familiar with those packages already you will find things very easy.
If you are worried you have done something wrong or are not sure what the actual result should look like, you can look at the results directory within each section which has example output.
WARNING:This practical makes use of 10ns simulations with coordinates taken every 5ps. This means that the resulting trajectory files are quite large. If you find that your computer is strugging especially with the visualization sections, then you should refer to Appendix I - Trajectory manipulations to learn how to circumvent some of these issues in order to still make use of the practical.
We will be using the molecular dynamics simulation package GROMACS. Extensive help is available online there both in html and as complete downloadable pdfs of the manual. The manual is worth downloading if you wish to understand some of the implementations of the algorithms and if you need further background into the subject of molecular dynamics.
NOTE: This trajectories and the practical are all based on using gromacs version 3.1.4. If you are using an older version then some things probably will not work. The VMD "directions" are based on the latest version (1.8.2).
Now, let us unpack the practical archive:-
% tar -xvf practical.tarThis should then expand various directories and files for use in the practical. Specifically it will create a directory called practical. You should change directory to this directory and work from there as the commands that follow assume you are starting from that location. Next, make sure you have the gromacs environment setup.
% grompp -hshould display some running information for the grompp program. If it does not contact the practical demonstrator. The -h option applies to all gromacs programs, so if you want to more about a certain program this will tell you about which options are valid. Indeed, the practical below is really only a route to show you what sorts of things are possible. You should explore the options available at each stage by giving the -h option.
% cd ompa/insertion
First of all lets take a quick look at our protein for this practical, ompa, to which we have added the polar hydrogens already. Either:-
% vmd data/ompa_only.pdb
or:-
% rasmol data/ompa_only.pdb
If you draw the protein as a cartoon (in VMD - click on graphics tab and then select representations. In the new window that pops up, click on the drawing method tab and select cartoon), you should be able to see that ompa, is an eight-stranded beta barrel protein. If you have axis displayed (Display-->axis-->lowerleft in VMD) you will see that in this case the axis of the barrel is aligned with the z-axis. That is fortunate for us, as it is quite common that the z-axis is also the vector of the membrane normal (although obviously this is not compulsory). As mentioned in the lecture this alignment with respect to the box can be performed in a variety of ways ranging from a monte-carlo energy search to alignment by eye. VMD allows you to do a visual alignment very easily. (Mouse-->Move-Molecule and then left mouse button while the cursor is on the protein. Doing the same but also pressing shift while holding down left mouse will rotate the molecule).
Now to set up our protein in a box of lipid we have to place our protein in the middle of the bilayer. There are several ways to do this, but if you have a reasonably symmetric system and bilayer the process is very straightforward. In gromacs, the periodic box information is held within the CRYST line of the pdb file. In the case of ompa_only.pdb it is as follows:-
CRYST1 75.349 55.669 73.176 90.00 90.00 90.00 P 1 1This says that the box has dimensions of x,y,z of 75.349, 55.669, 73.176 and the angles between the vertices are 90 degrees. The lipid bilayer system also has a box specified. We will use data/dmpc_288.pdb. If you enter the following:
% head -10 data/dmpc_288.pdb
You will see the line:-
CRYST1 98.675 98.675 71.709 90.00 90.00 90.00 P 1 1We need to place our protein molecule within this box. We do this in three steps.
VMD is perfectly suited for these steps:
First load up the lipid into VMD (File-->New Nolecule) Click on data/ directory in the selection box and select dmpc_288.pdb. Then in the Molecule File Browser window hit the load button. This will load an equilibrated lipid bilayer complete with waters into the viewer. Next we need to load in out protein. In the "Molecule File Browser" window. Click on the green section next to "Load files for" and select "New Molecule", then hit browse. You should still be in the data directory. Select ompa_only.pdb. Hit the "Load" button. You should now see on the screen both molecules. The protein will not be in the center of the bilayer, and depending on your opinion may not quite be in the correct position with respect to the bilayer normal. The next stage is thus to move the protein to the correct position which we can do reasonably well by eye as follows.
Close the "Molecule File Browser". Select "Graphics-->Representations" from the main VMD menu. Select dmpc_288.pdb under "Selected Molecule". Then in the "Selected Atoms" window replace "all" with "not water". You should see the waters that were around the lipid bilayer disappear from view. This will help us move the protein into position. Next select "ompa_only.pdb" under the "Selected Molecule" section, and then under "Drawing Method" select "VDW". You should see the protein representation change to a series of vdw spheres. Now we can being to move our protein into position. Under the main vmd menu select "Mouse-->Move-->Molecule" (or the shortcut is to simply press 8 on the keyboard). Left-click the protein molecule and drag it into the desired position.
WARNING: It will save a lot of time if you do not move the lipid at all. If you do, the easiest thing is to start over again.
Once you have the protein in the correct position with respect to this bilayer, we need to write out the new coordinates of the protein. In the main vmd menu make sure that the ompa_only.pdb (ID number 2) is highlighted yellow. The select "File-->Save Coordinates". Under "Selected Molecule" choose ompa_only.pdb and under "File Type:" select pdb. Then hit "save". Rename it something new for example ompa_only_moved". We now have our coordinates in the correct position but our perodic box is not the same as the bilayer. To do this the simplest thing is simply to copy and paste the CRYST line from dmpc_288.pdb into your new protein pdb; ompa_only_moved.pdb (or whatever you called it). Do this with your favourite editor, say nedit or xemacs and call the new file ompa_in_box.pdb. Once you have done that, the final step is simply to combine the two files and delete overlapping lipids from where the protein sits. Fortunately, gromacs can do all this for you in one easy step:-
% genbox -cp ompa_in_box.pdb -cs data/dmpc_288.pdb -o ompa_in_dmpc_288.pdbThis effectively "solvates" the protein with the complete lipid bilayer. You can check this out by looking at the resulting pdb in vmd. You should be able to see an obvious gap between the protein and the lipid. This gap will shrink as the lipid starts to pack properly around the protein as demonstrated in the lecture.
This is the start position for your simulation. You may want to add counter-ions now to neutralize the system ( necessary if you are using PME for your electrostatics). This system would now have to be thoroughly equilibrated which will require of the order of at least 3ns (and anything up to 20ns). We will discuss what defines a well equilibrated system later in this tutorial. Of course, you won't have time to run this simulation now, and that concludes the first part of the practical. In the next part of the practical we will discuss how to analyse some simulations.
% cd practical/ompa/analysis/ % vmdWhen it has finished placing all the windows on the screen. Click on "File" in the VMD main menu window and select "new molecule". The Molecule File Browser window should appear. Click on "browse" then select setup and finally select start.pdb. Click "OK" and then click "Load". It should load up the starting coordinates into the main window. Then click browse in Molecule File Browser window and select "Up on directory". Select the data directory and then select ompa.xtc. Select "OK" and then hit "load". The trajectory should start loading into the main VMD window. Although things will be moving, you can see that its quite difficult to visualize the individual components. That is one of the problems with simulating such large a complicated systems. VMD makes it quite easy to look at individual components of a system. For example, let us consider the protein only. On the main menu, left click on graphics and select "representations". A new menu will appear (Graphical Representations). In the box entitled "selected atoms" type protein and hit enter. Only those atoms that form part of the protein are now selected. Various other selections and drawing methods will help to visualize different aspects of the simulation. More help on the selections is available here.
Exit vmd and restart it from within the practical/kcsa/analysis. Have a look at the other trajectory. The initial coodinates are in practical/kcsa/analysis/setup/start.pdb and the corresponding trajectory is in practical/kcsa/analysis/data/kcsa.xtc. Try to figure out a good way to visualize the protein motion.
Now that we are sure the simulation is not doing anything ridiculous, we can start to ask questions about the simulation. The first thing to establish is whether the simulation is stable. In smaller systems this might mean the same as in equilibrium, but that it is probably fair to say that in membrane proteins, this is a little difficult to achieve with respect to the whole system. So what are some measures of stability? and what can we use to test the reliability of the simulation?
% g_energy -f ompa/analysis/data/ompa.edr -s ompa/anslysis/setup/ompa_no_shuffling.tpr -o ompa_temperature.xvgThe program will then present you with a large table of all the values recorded in the energy (.edr) file. We want to examine temperature so type 14, press enter and then 0 and press enter again. The program will then analyse the temperature and present some statistics of the analysis at the end. We can look at the fluctuations in more detail by using xmgrace (or xmgr).
% xmgrace ompa_temperature.xvgwill display the temperature fluctuations.
Another set of properties that is quite useful to examine is the various energetic contributions to the energy. The total energy should be constant. but the various contributions can change and this can sometimes indicate something interesting or strange happening in your simulation. Let us look at some energetic properties of the ompa simulation.
% g_energy -s ompa/analysis/setup/ompa.tpr -f ompa/analysis/data/ompa.edr -o ompa_energies.xvgWe shall select the total energy (13), the coulombic short range interactions between the protein and the DMPC lipid (49) and between the protein and the non-crystallographic waters (59). Once you have made those selections, look at the resulting graph with xmgrace again.
% xmgrace -nxy ompa_energies.xvg
Now we will consider the lipid bilayer.
Firstly we could consider whether the lipid molecules are behaving as expected. One easy to analyse property is the mass densities of various components of the system. We will take the last half of the simulation. For a system of this size, we would expect things to have settled down by 5ns, but you should also check how this profile changes/varies with time for yourselves.
% g_density -f ompa/analysis/data/ompa.xtc -n ompa/analysis/data/ompa.ndx -s ompa/analysis/setup/ompa.tpr -o ompa_densities.xvg -sl 50 -b 5000
We want to look at the partial density of the lipid headgroups, lipid tails, protein and the water, so when the program asks how many groups you want to consider enter 4 and then select group 1 (protein), 22(headgroups), 24(tails) and 25(HOH_SOL - the crystallographic waters and the solvation waters). The resulting file will be called density.xvg. Examine this file by opening it with xmgr or xmgrace. (File--> read sets-->format is X,Y1, Y2 etc..)
(Note: It is more common for mass densities to be quoted as g/cm-3).
Now that we are happy with the lipids in the membrane, we shall turn our attention to some simple protein properties. First let us consider the root mean square deviation (RMSD). We shall use the tool g_rms
% g_rms -s ompa/analysis/setup/start.pdb -f ompa/analysis/data/ompa.xtc -n ompa/analysis/setup/index.ndx -o ompa_rms.xvg
We will use the Cα coordinates to do the fitting to, so select group 3 when prompted. It will then ask how many groups we want to consider. Enter 3, as we want to look at all protein atoms, all Cα and the mainchain RMSD. Have a look at the resulting graph with xmgr or xmgrace.
Sometimes it is more useful to examine the rmsd of the core regions of the protein. In the case of ompa, the beta-barrel can be considered the core structural region and the loops would be expected to show higher mobility. To do this we need to make use of an index file which in gromacs has the .ndx extension. In the index file you can specify several sets of atoms for future analysis. You can do this easily yourself with the make_ndx program but for now we'll use one we made earlier..
% g_rms -s kcsa/analysis/setup/kcsa.tpr -f kcsa/analysis/data/kcsa.xtc -o kcsa_rmsd.xvg
Examine the file with xmgrace.
A simlar property that is particularly useful is the root mean square fluctuation (RMSF). This can be related back to the b-factor value which is given in X-ray crystallography and hence provides an easy way to compare your simulation back to experiment. Let us first obtain the rmsf for ompa according to the last 8ns of the simulation:
% g_rmsf -s ompa/analysis/setup/start.pdb -f ompa/analysis/data/ompa.xtc -n ompa/analysis/setup/ompa.ndx -b 2000 -oq ompa_bfac.pdb -o ompa_rmsf.xvgThis will produce a pdb file (ompa_bfac.pdb) that has b-factors in the 10th column. We can extract this data ready for plotting in xmgrace as followings:-
% more ompa_bfac.pdb | grep CA | awk '{print $5,$10}' > ompa_bfac_ca.dat
Look at this data in xmgrace.
We can compare this simulation data directly with experimental b-factors from the high resolution structure of ompa. Read in the following ascii data file into xmgrace:-
ompa/analysis/data/1qjp_bfactors.dat
Now let us move onto something a little more specific. KcsA is an ion channel that is selective for the potassium ion. Although we are still a long way off being able to simulate the length of time that would enable us to pedict conductance (that would require many averages of a stochastic process), we are able to observe individual ion movements and the response of the protein in response to those movements. Let us first of all plot the motion of the ions in the pore, for this we can simply use g_traj
% g_traj -f kcsa/analysis/data/kcsa.xtc -s kcsa/analysis/setup/kcsa.tpr -n kcsa/analysis/setup/kcsa_index.ndx -ox ions_kcsa.xvg -nox -noy -com
When the program asks for how many groups, type 3 and then enter the numbers of the 3 potassium ions (18, 19 and 20). When it finishes, examine the result with xmgrace.
It may appear that the ions are moving, but let us consider where they are in relation to the rest of the protein. In KcsA, the so called selectivity filter has very distinct ion binding sites delimited by the rings of backbone oxygen atoms. This figure shows the location of the sites and how they have been referred to in the literature. It is of interest to see whether the ions stay within these binding sites. To do this we need to plot the positions of the oxygens describing these binding sites
% g_traj -f kcsa/analysis/data/kcsa.xtc -s kcsa/analysis/setup/start.pdb -n kcsa/analysis/setup/kcsa_index.ndx -ox site_oxygens_kcsa.xvg -nox -noy -com
Enter 6 groups, 27 through 32 to give the oxygens z-coordinates. The program will automatically take the position of all four oxygens in each group.
You should try and confirm these results by looking at the trajectory in VMD. Remember that you might have to reduce the size of the trajectory (see Appendix I)
Now that we have gone through some simple analysis, you might like to think of other things to look at. For example, can you analyse the average phosphate-water interaction distance?
The texts recommended here are the same as those mentioned in the lecture:-
All of these manipulations can be accomplished very easily with the trjconv program. For example
trjconv -f kcsa.xtc -s kcsa.tpr -b 5000 -o kcsa_last_5ns_only.xtcwill write out the last five nanoseconds from this 10ns trajectory.
To write out less frequently, use the skip option. For example to write out the 10ns trajectory but
every 20ps instead of every 5:
trjconv -f kcsa.xtc -s kcsa.tpr -o kcsa_less_sample.xtc -skip 4This option is probably the best one as it gives you visualization across the whole trajectory (although obviously you should be aware that you might miss some very short timescale motions).
The program will prompt you selections so writing out just protein for example is very straightforward.
If your selection is not there, then you should include an index file. You can add your selection to the index file, by using the make_ndx program.
| ls -lrt | provides a "long" list of all files in the current directory in reverse order of time. |
| cd dir | change directory to the directory 'dir' |
| pwd | print the current working directory on the screen |
| rm file | delete (remove) 'file' |
| mv file newfile | rename file to newfile |
| cat file | print the contents of file to the screen |
| more file | print the contents of file to the screen but with more navigation possible. |
I also thank the Wellcome Trust and the Oxford Supercomputing Centre.