EMBO course, Shanghai, 11-23 September 2004.

Gromacs tutorial for membrane protein simulations.

Phil Biggin phil@biop.ox.ac.uk

Table of Contents.

  1. About this practical
  2. Practicalities
  3. Gromacs specific
  4. Membrane Proteins- Insertion and Setup
  5. Membrane Proteins- Analysis
  6. Further Reading
  7. Appendix I - Trajectory manipulations
  8. Appendix II - Useful Software
  9. Appendix III - Basic Unix Commands
  10. Acknowledgements

  1. About this Practical.
  2. This practical is designed to demonstrate what sorts of questions and problems occur when simulating proteins that are embedded within a lipid bilayer. Ideally, we would include setting the system up, running the simulation, and then analysing it, but that is rather time-consuming as we would need a rather long time to equilibrate the system (too long for one afternoon's practical that's for sure!). We will thus start from effectively two positions. In the first I will illustrate a potential protocal for inserting your membrane protein. You should be aware that this is not the only method (as I mentioned in the lecture), and it does require a larger amount of equilibration time than some of the other methdods, but that is actually becoming less of an issue as computer power increases.

    In the second we will start from a position where the simulation has actually been run and we need to analyse our trajectory. Actually we will look at two trajectories of two different membrane proteins. We will first look at some simple analysis to see whether or not our system is in equilibrium and from there look at some time-dependent properties.

    The practical is both open-ended and in the analysis part is fairly independent of previous steps which means you can opt out of certain exercises if you wish. However I would recommend proceeding sequentially through all of it. Finally, I assume very little knowledge of gromacs and VMD so if you are familiar with those packages already you will find things very easy.

  3. Practicalities.
  4. In this practical session, it is assumed that the user is reasonably familiar with some basic Linux/Unix command line tools. See the Appendix II - Basic Unix Commands if you need some help in this respect. In the following the % symbol is used to indicate the command line prompt. Where indicated enter everything after this symbol. For visualization we will use the freely available molecular graphics package, VMD, or you might prefer to use the package that comes with gromacs, called ngmx, but this is less powerful and not as pretty as VMD.

    If you are worried you have done something wrong or are not sure what the actual result should look like, you can look at the results directory within each section which has example output.

    WARNING:This practical makes use of 10ns simulations with coordinates taken every 5ps. This means that the resulting trajectory files are quite large. If you find that your computer is strugging especially with the visualization sections, then you should refer to Appendix I - Trajectory manipulations to learn how to circumvent some of these issues in order to still make use of the practical.

    We will be using the molecular dynamics simulation package GROMACS. Extensive help is available online there both in html and as complete downloadable pdfs of the manual. The manual is worth downloading if you wish to understand some of the implementations of the algorithms and if you need further background into the subject of molecular dynamics.

    NOTE: This trajectories and the practical are all based on using gromacs version 3.1.4. If you are using an older version then some things probably will not work. The VMD "directions" are based on the latest version (1.8.2).

    Now, let us unpack the practical archive:-

    % tar -xvf practical.tar
    
    This should then expand various directories and files for use in the practical. Specifically it will create a directory called practical. You should change directory to this directory and work from there as the commands that follow assume you are starting from that location. Next, make sure you have the gromacs environment setup.
    % grompp -h 
    
    should display some running information for the grompp program. If it does not contact the practical demonstrator. The -h option applies to all gromacs programs, so if you want to more about a certain program this will tell you about which options are valid. Indeed, the practical below is really only a route to show you what sorts of things are possible. You should explore the options available at each stage by giving the -h option.

  5. Specifics to gromacs.
  6. Membrane Proteins. Insertion and Setup
  7. First, we will go through a possible protocal to generate some initial coordinates to start your simulation. To do this you simply need your protein prepared to include all the missing residues and polar hydrogens, and of course a box of lipid molecules. You should work in the directory ompa/insertion for this part of the practical.
    % cd ompa/insertion
    

    First of all lets take a quick look at our protein for this practical, ompa, to which we have added the polar hydrogens already. Either:-

    % vmd data/ompa_only.pdb
    

    or:-

    % rasmol data/ompa_only.pdb
    

    If you draw the protein as a cartoon (in VMD - click on graphics tab and then select representations. In the new window that pops up, click on the drawing method tab and select cartoon), you should be able to see that ompa, is an eight-stranded beta barrel protein. If you have axis displayed (Display-->axis-->lowerleft in VMD) you will see that in this case the axis of the barrel is aligned with the z-axis. That is fortunate for us, as it is quite common that the z-axis is also the vector of the membrane normal (although obviously this is not compulsory). As mentioned in the lecture this alignment with respect to the box can be performed in a variety of ways ranging from a monte-carlo energy search to alignment by eye. VMD allows you to do a visual alignment very easily. (Mouse-->Move-Molecule and then left mouse button while the cursor is on the protein. Doing the same but also pressing shift while holding down left mouse will rotate the molecule).

    Now to set up our protein in a box of lipid we have to place our protein in the middle of the bilayer. There are several ways to do this, but if you have a reasonably symmetric system and bilayer the process is very straightforward. In gromacs, the periodic box information is held within the CRYST line of the pdb file. In the case of ompa_only.pdb it is as follows:-

    CRYST1   75.349   55.669   73.176  90.00  90.00  90.00 P 1           1
    
    This says that the box has dimensions of x,y,z of 75.349, 55.669, 73.176 and the angles between the vertices are 90 degrees. The lipid bilayer system also has a box specified. We will use data/dmpc_288.pdb. If you enter the following:

    % head -10 data/dmpc_288.pdb 
    

    You will see the line:-

    CRYST1   98.675   98.675   71.709  90.00  90.00  90.00 P 1           1
    
    We need to place our protein molecule within this box. We do this in three steps.

    1. Align protein with lipid bilayer so it is the same coordinate space as the lipid.
    2. Transfer new protein coordinates so they are within the same box as the lipid
    3. Delete the overlapping lipids.

    VMD is perfectly suited for these steps:

    First load up the lipid into VMD (File-->New Nolecule) Click on data/ directory in the selection box and select dmpc_288.pdb. Then in the Molecule File Browser window hit the load button. This will load an equilibrated lipid bilayer complete with waters into the viewer. Next we need to load in out protein. In the "Molecule File Browser" window. Click on the green section next to "Load files for" and select "New Molecule", then hit browse. You should still be in the data directory. Select ompa_only.pdb. Hit the "Load" button. You should now see on the screen both molecules. The protein will not be in the center of the bilayer, and depending on your opinion may not quite be in the correct position with respect to the bilayer normal. The next stage is thus to move the protein to the correct position which we can do reasonably well by eye as follows.

    Close the "Molecule File Browser". Select "Graphics-->Representations" from the main VMD menu. Select dmpc_288.pdb under "Selected Molecule". Then in the "Selected Atoms" window replace "all" with "not water". You should see the waters that were around the lipid bilayer disappear from view. This will help us move the protein into position. Next select "ompa_only.pdb" under the "Selected Molecule" section, and then under "Drawing Method" select "VDW". You should see the protein representation change to a series of vdw spheres. Now we can being to move our protein into position. Under the main vmd menu select "Mouse-->Move-->Molecule" (or the shortcut is to simply press 8 on the keyboard). Left-click the protein molecule and drag it into the desired position.

    WARNING: It will save a lot of time if you do not move the lipid at all. If you do, the easiest thing is to start over again.

    Once you have the protein in the correct position with respect to this bilayer, we need to write out the new coordinates of the protein. In the main vmd menu make sure that the ompa_only.pdb (ID number 2) is highlighted yellow. The select "File-->Save Coordinates". Under "Selected Molecule" choose ompa_only.pdb and under "File Type:" select pdb. Then hit "save". Rename it something new for example ompa_only_moved". We now have our coordinates in the correct position but our perodic box is not the same as the bilayer. To do this the simplest thing is simply to copy and paste the CRYST line from dmpc_288.pdb into your new protein pdb; ompa_only_moved.pdb (or whatever you called it). Do this with your favourite editor, say nedit or xemacs and call the new file ompa_in_box.pdb. Once you have done that, the final step is simply to combine the two files and delete overlapping lipids from where the protein sits. Fortunately, gromacs can do all this for you in one easy step:-

    % genbox -cp ompa_in_box.pdb -cs data/dmpc_288.pdb -o ompa_in_dmpc_288.pdb
    
    This effectively "solvates" the protein with the complete lipid bilayer. You can check this out by looking at the resulting pdb in vmd. You should be able to see an obvious gap between the protein and the lipid. This gap will shrink as the lipid starts to pack properly around the protein as demonstrated in the lecture.

    This is the start position for your simulation. You may want to add counter-ions now to neutralize the system ( necessary if you are using PME for your electrostatics). This system would now have to be thoroughly equilibrated which will require of the order of at least 3ns (and anything up to 20ns). We will discuss what defines a well equilibrated system later in this tutorial. Of course, you won't have time to run this simulation now, and that concludes the first part of the practical. In the next part of the practical we will discuss how to analyse some simulations.

  8. Membrane Proteins. Analysis
  9. We are going to look at two simulations of different types of membrane protein that have already been performed, one is a potassium channel and one is an outer-membrane protein. The simplest and easiest type of analysis you should always do is to look at it with your eyes! Your eye will tell if you something strange is happening immediately. A numerical analysis may not. Let us look at the two simulations using VMD. First change directory to where you unpacked the practical archive and change directory as follows:-
    % cd practical/ompa/analysis/
    
    % vmd
    
    
    When it has finished placing all the windows on the screen. Click on "File" in the VMD main menu window and select "new molecule". The Molecule File Browser window should appear. Click on "browse" then select setup and finally select start.pdb. Click "OK" and then click "Load". It should load up the starting coordinates into the main window. Then click browse in Molecule File Browser window and select "Up on directory". Select the data directory and then select ompa.xtc. Select "OK" and then hit "load". The trajectory should start loading into the main VMD window. Although things will be moving, you can see that its quite difficult to visualize the individual components. That is one of the problems with simulating such large a complicated systems. VMD makes it quite easy to look at individual components of a system. For example, let us consider the protein only. On the main menu, left click on graphics and select "representations". A new menu will appear (Graphical Representations). In the box entitled "selected atoms" type protein and hit enter. Only those atoms that form part of the protein are now selected. Various other selections and drawing methods will help to visualize different aspects of the simulation. More help on the selections is available here.

    Exit vmd and restart it from within the practical/kcsa/analysis. Have a look at the other trajectory. The initial coodinates are in practical/kcsa/analysis/setup/start.pdb and the corresponding trajectory is in practical/kcsa/analysis/data/kcsa.xtc. Try to figure out a good way to visualize the protein motion.

    Now that we are sure the simulation is not doing anything ridiculous, we can start to ask questions about the simulation. The first thing to establish is whether the simulation is stable. In smaller systems this might mean the same as in equilibrium, but that it is probably fair to say that in membrane proteins, this is a little difficult to achieve with respect to the whole system. So what are some measures of stability? and what can we use to test the reliability of the simulation?

    1. Analysis of System Properties
    2. There are various so-called ensembles that are used for membrane proteins, probably the most common is a system where the number of particles, the pressure and the temperature are held constant. This is usually achieved by means of a heat-bath. Nevertheless, it is usually a good idea to check these as a function of time through the trajectory just to make sure nothing unexpected happened. First let us check the temperature of the OMPA simulation. In the following start from within the practical directory.
      % g_energy -f ompa/analysis/data/ompa.edr -s ompa/anslysis/setup/ompa_no_shuffling.tpr -o ompa_temperature.xvg
      
      
      The program will then present you with a large table of all the values recorded in the energy (.edr) file. We want to examine temperature so type 14, press enter and then 0 and press enter again. The program will then analyse the temperature and present some statistics of the analysis at the end. We can look at the fluctuations in more detail by using xmgrace (or xmgr).
      % xmgrace ompa_temperature.xvg
      
      will display the temperature fluctuations.

      Another set of properties that is quite useful to examine is the various energetic contributions to the energy. The total energy should be constant. but the various contributions can change and this can sometimes indicate something interesting or strange happening in your simulation. Let us look at some energetic properties of the ompa simulation.

      %  g_energy -s ompa/analysis/setup/ompa.tpr -f ompa/analysis/data/ompa.edr -o ompa_energies.xvg
      
      We shall select the total energy (13), the coulombic short range interactions between the protein and the DMPC lipid (49) and between the protein and the non-crystallographic waters (59). Once you have made those selections, look at the resulting graph with xmgrace again.
      % xmgrace -nxy ompa_energies.xvg 
      
      Q. Is the total energy stable in this simulation?

      Q. What might the short-range coulombic interactions between the protein and the DMPC and protein and water indicate?

      Now we will consider the lipid bilayer.

    3. Analysis of the Lipid Bilayer
    4. Firstly we could consider whether the lipid molecules are behaving as expected. One easy to analyse property is the mass densities of various components of the system. We will take the last half of the simulation. For a system of this size, we would expect things to have settled down by 5ns, but you should also check how this profile changes/varies with time for yourselves.

      % g_density -f ompa/analysis/data/ompa.xtc -n ompa/analysis/data/ompa.ndx -s ompa/analysis/setup/ompa.tpr -o ompa_densities.xvg -sl 50 -b 5000
      

      We want to look at the partial density of the lipid headgroups, lipid tails, protein and the water, so when the program asks how many groups you want to consider enter 4 and then select group 1 (protein), 22(headgroups), 24(tails) and 25(HOH_SOL - the crystallographic waters and the solvation waters). The resulting file will be called density.xvg. Examine this file by opening it with xmgr or xmgrace. (File--> read sets-->format is X,Y1, Y2 etc..)

      Q. What does this information tell us about the bilayer? Is there anything unsual about the profile?

      (Note: It is more common for mass densities to be quoted as g/cm-3).

      Q. What other properties might be useful to obtain in order to make useful comparisons back to experiments?
    5. Analysis of Protein
    6. RMSD of OMPA

      Now that we are happy with the lipids in the membrane, we shall turn our attention to some simple protein properties. First let us consider the root mean square deviation (RMSD). We shall use the tool g_rms

      % g_rms -s ompa/analysis/setup/start.pdb -f ompa/analysis/data/ompa.xtc -n ompa/analysis/setup/index.ndx -o ompa_rms.xvg
      

      We will use the Cα coordinates to do the fitting to, so select group 3 when prompted. It will then ask how many groups we want to consider. Enter 3, as we want to look at all protein atoms, all Cα and the mainchain RMSD. Have a look at the resulting graph with xmgr or xmgrace.

      Q. What does this tell you about the stability of the protein? Is it in a state of equilibrium and if so why and at what time?

      Q. Can you think of a situation where this approach might not be a very good indication of stability?

      Sometimes it is more useful to examine the rmsd of the core regions of the protein. In the case of ompa, the beta-barrel can be considered the core structural region and the loops would be expected to show higher mobility. To do this we need to make use of an index file which in gromacs has the .ndx extension. In the index file you can specify several sets of atoms for future analysis. You can do this easily yourself with the make_ndx program but for now we'll use one we made earlier..

      RMSD of KcsA
      
      % g_rms -s kcsa/analysis/setup/kcsa.tpr -f kcsa/analysis/data/kcsa.xtc -o kcsa_rmsd.xvg
      
      

      Examine the file with xmgrace.

      Q. How does this result compare with your answers for ompa?

      Q. What do these results tell you about membrane proteins?

      RMSF

      A simlar property that is particularly useful is the root mean square fluctuation (RMSF). This can be related back to the b-factor value which is given in X-ray crystallography and hence provides an easy way to compare your simulation back to experiment. Let us first obtain the rmsf for ompa according to the last 8ns of the simulation:

      % g_rmsf -s ompa/analysis/setup/start.pdb -f ompa/analysis/data/ompa.xtc -n ompa/analysis/setup/ompa.ndx -b 2000 -oq ompa_bfac.pdb -o ompa_rmsf.xvg
      
      
      This will produce a pdb file (ompa_bfac.pdb) that has b-factors in the 10th column. We can extract this data ready for plotting in xmgrace as followings:-

      % more ompa_bfac.pdb | grep CA | awk '{print $5,$10}' > ompa_bfac_ca.dat
      
      Look at this data in xmgrace.

      Q. Can you identify structural regions alone from this plot and does that fit in with the structure??

      We can compare this simulation data directly with experimental b-factors from the high resolution structure of ompa. Read in the following ascii data file into xmgrace:-

      ompa/analysis/data/1qjp_bfactors.dat

      Q. Does the simulation agree with experiment?

      Q. Where does it not agree and can you suggest reasons why it does not?
      Ion movements and flexiblity

      Now let us move onto something a little more specific. KcsA is an ion channel that is selective for the potassium ion. Although we are still a long way off being able to simulate the length of time that would enable us to pedict conductance (that would require many averages of a stochastic process), we are able to observe individual ion movements and the response of the protein in response to those movements. Let us first of all plot the motion of the ions in the pore, for this we can simply use g_traj

      % g_traj -f kcsa/analysis/data/kcsa.xtc -s kcsa/analysis/setup/kcsa.tpr -n kcsa/analysis/setup/kcsa_index.ndx -ox ions_kcsa.xvg -nox -noy -com
      

      When the program asks for how many groups, type 3 and then enter the numbers of the 3 potassium ions (18, 19 and 20). When it finishes, examine the result with xmgrace.

      Q. What happens to the ions?

      It may appear that the ions are moving, but let us consider where they are in relation to the rest of the protein. In KcsA, the so called selectivity filter has very distinct ion binding sites delimited by the rings of backbone oxygen atoms. This figure shows the location of the sites and how they have been referred to in the literature. It is of interest to see whether the ions stay within these binding sites. To do this we need to plot the positions of the oxygens describing these binding sites

       % g_traj -f kcsa/analysis/data/kcsa.xtc -s kcsa/analysis/setup/start.pdb -n kcsa/analysis/setup/kcsa_index.ndx -ox site_oxygens_kcsa.xvg -nox -noy -com
      

      Enter 6 groups, 27 through 32 to give the oxygens z-coordinates. The program will automatically take the position of all four oxygens in each group.

      Q. What really happens to the ions?

      Q. Do you think we should be able to observe ion permeation?

      You should try and confirm these results by looking at the trajectory in VMD. Remember that you might have to reduce the size of the trajectory (see Appendix I)

    7. On your own...
    8. Now that we have gone through some simple analysis, you might like to think of other things to look at. For example, can you analyse the average phosphate-water interaction distance?

    That concludes this practical. If you have any comments and/or suggestions then please feel free to email phil@biop.ox.ac.uk. If you have finished the above exercise try to see if you can work out how to plot a distance between two Cα atoms of two residues from the ompa simulation as a function of time.

  10. Further Reading.
  11. The texts recommended here are the same as those mentioned in the lecture:-

  12. Appendix I - Trajectory Manipulations.
  13. If your computer is less powerful than a pentium 4 with 1Gb of memory, then you will certainly have problems visualizing the trajectories in VMD. This is a problem for most membrane simulations, where the capacity to simulate long timescales has exceeded the visualization capability. There are however a number of ways to circumvent this and still be able to follow the practical:-

    All of these manipulations can be accomplished very easily with the trjconv program. For example

    trjconv -f kcsa.xtc -s kcsa.tpr -b 5000 -o kcsa_last_5ns_only.xtc
    will write out the last five nanoseconds from this 10ns trajectory.

    To write out less frequently, use the skip option. For example to write out the 10ns trajectory but every 20ps instead of every 5:

    trjconv -f kcsa.xtc -s kcsa.tpr -o kcsa_less_sample.xtc -skip 4
    
    This option is probably the best one as it gives you visualization across the whole trajectory (although obviously you should be aware that you might miss some very short timescale motions).

    The program will prompt you selections so writing out just protein for example is very straightforward.

    If your selection is not there, then you should include an index file. You can add your selection to the index file, by using the make_ndx program.

  14. Appendix II - Useful Packages.
  15. Appendix III - Basic Unix/Linux Commands.
  16. ls -lrtprovides a "long" list of all files in the current directory in reverse order of time.
    cd dir change directory to the directory 'dir'
    pwd print the current working directory on the screen
    rm file delete (remove) 'file'
    mv file newfile rename file to newfile
    cat file print the contents of file to the screen
    more file print the contents of file to the screen but with more navigation possible.

  17. Acknowledgements.
  18. I must acknowledge Peter Bond for the kind "loan" of these trajectories and supporting files. I also thank Jorge Pikunic for lipid parameter discussions

    I also thank the Wellcome Trust and the Oxford Supercomputing Centre.