Introduction to gromacs

In the lectures we learned what MD simulations are and why they are interesting, now it is time to set up a simulation.

For this, we will use the GROMACS simulation package.

For this course, we will simulate the dynamics of a small, typical protein domain: the B1 domain of protein G. B1 is one of the domains of protein G, a member of an important class of proteins, which form IgG binding receptors on the surface of certain Staphylococcal and Streptococcal strains. These proteins allow the pathogenic bacterium to evade the host immune response by coating the invading bacteria with host antibodies, thereby contributing significantly to the pathogenicity of these bacteria (click here for further background information on this protein).

Before a simulation can be started, an initial structure of the protein is required. Fortunately, the structure of the B1 domain of protein G has been solved experimentally, both by x-ray crystallography and NMR. Experimentally solved protein structures are collected and distributed by the Protein Data Bank (PDB). Please open this link in a new browser window and enter "protein G B1" in the search field. Several entries in the PDB should match this query. We will choose the x-ray structure with the highest resolution (entry 1PGB) for this study. To download the structure, click on the link "1PGB", and then on "download". On the download page, select "download" with compression "none" and format "PDB". When prompted, select "save to disk", and save the file to the local hard disk. On the unix prompt, type:

rasmol 1PGB.pdb
to visualise the structure. Please note that the commands in bold print can be easily transferred to the command prompt with cut-and-paste. We now see a so-called wireframe representation of the protein structure: atoms (with different colors for the different chemical elements: grey for carbon; red for oxygen and blue for nitrogen) are not shown directly, but the bonds between atoms are shown as lines. Under "display", also try other representations such as "sticks", "spacefill", "ball & stick" and "cartoons". Exit rasmol under "file" -> "exit".

We will now prepare the protein structure to be simulated in gromacs. Although we now have a starting structure for our protein, one might have noticed that hydrogen atoms (white) are still missing from the structure. This is because hydrogen atoms contain too few electrons to be observed by x-ray crystallography at moderate resolutions. Also, gromacs requires a molecular description (or topology) of the molecules to be simulated before we can start, containing information on e.g. which atoms are covalently bonded and other physical information. Both the generation of hydrogen atoms and writing of the topology can be done with the gromacs program pdb2gmx:

pdb2gmx -f 1PGB.pdb -o conf.pdb
when prompted for the force-field to be used, choose "3" (OPLS-AA/L all-atom force field). View the result with:

rasmol conf.pdb
The topology file written by pdb2gmx is called "topol.top". Have a look at the contents of the file using:

more topol.top
you will see a list of all the atoms (with masses, charges), followed by bonds (covalent bonds connecting the atoms), angles, dihedral angles etc. Near the very end of the topology (in the "[molecules]" section) there is a summary of the simulation system, including the protein and 24 crystallographic water molecules. (quit the "more" program by pressing "q")

The topology file contains all the physical information about all interactions between the atoms of the protein (bonds, angles, torsion angles, Lennard-Jones interactions and electrostatic interactions).

The next step in setting up the simulation system is to solvate the protein in a water bo, to mimick a physiological environment. For that, we first need to define a simulation box. In this case we will generate a rectangular box with the box-edges at least 7 Angstroms apart from the protein surface:

editconf -f conf.pdb -o box.pdb -d 0.7
(note that gromacs uses units of nanometers). View the result with

rasmol box.pdb
and, in rasmol, type:

unitcell true
Now we fill the simulation box with SPC water using genbox:

genbox -cp box.pdb -cs spc216 -o water.pdb -p topol.top
Again, view the output (water.pdb) with rasmol. Now the simulation system is almost ready. Before we can start the dynamics, we must perform an energy minimisation, to alleviate any bad contacts (atoms overlapping such that a significant repulsion would result, causing numerical problems in the simulation) that might be present in the system. For this we need a parameter file, specifying which type of minimisation should be carried out, the number of steps, etc. For your convenience a file called "em.mdp" has already been prepared and can be downloaded from here. View the file with "more" to see its contents. We use the gromacs preprocessor to prepare our energy minimisation:
grompp -f em.mdp -c water.pdb -p topol.top -o em.tpr
This collects all the information from em.mdp, the coordinates from water.pdb and the topology from topol.top, checks if the contents are consistent and writes a unified output file: em.tpr, which will be used to carry out the minimisation:
mdrun -v -s em.tpr -c em.pdb
The output shows that already the initial energy was rather low, so in this case there were hardly any bad contacts. Having a look at "em.pdb" shows that the structure hardly changed during minimisation.


The careful user may have noticed that grompp gave a warning:

System has non-zero total charge: -4.

Before we continue with the dynamics, we should neutralise this net charge of the simulation system. This is to prevent artefacts that would arise as a side effect caused by the periodic boundary conditions used in the simulation. A net charge would result in an electrostatic repulsion between neighbouring periodic images. Therefore, 4 sodium ions will be added to the system:


genion -s em.tpr -o ions.pdb -np 4
Type "12" to select the water group (SOL) from which 4 water molecules will be replaced by sodium ions. The output (ions.pdb) can be checked with rasmol. To better see the ions, type (in rasmol):
select na
cpk
Since we now changed the topology of the system (4 water molecules were replaced by sodium ions), we have to manually adapt the topology:
xedit topol.top
browse towards the end of the file, and change the number of SOL (water) molecules (from 2183 to 2179). Then, add a line with "NA+ 4", and do a "save" followed by "quit". Just to be on the safe side, we repeat the energy minimisation, now with the ions included (remember to (re)run grompp to create a new run input file whenever changes to the topology, or coordinates have been made):
grompp -f em.mdp -c ions.pdb -p topol.top -o em.tpr
mdrun -v -s em.tpr -c em.pdb
Now we have all that is required to start the dynamics. Again, a parameter file has been prepared for the simulation, and can be dowloaded here. Please browse through the file "md.mdp" (using "more") to get an idea of the simulation parameters. The gromacs online manual describes all parameters in detail here. Please don't worry in this stage about all individual parameters, we've chosen common values typical for protein simulations. Again, we use the gromacs preprocessor to prepare the simulation:

grompp -f md.mdp -c em.pdb -p topol.top -o md.tpr
and start the simulation!
mdrun -v -s md.tpr -c md.pdb
The simulation is running now, and depending on the speed and load of the computer, the simulation will run for a number of minutes. Until the simulation is finished, relax, lean back or drink a coffee, before clicking here to continue with the analysis of the simulation.