1. IntroductionOne of the principal tools in the theoretical study of biological molecules is the method of molecular dynamics simulations (MD). This computational method calculates the time dependent behavior of a molecular system. MD simulations have provided detailed information on the fluctuations and conformational changes of proteins and nucleic acids. These methods are now routinely used to investigate the structure, dynamics and thermodynamics of biological molecules and their complexes. They are also used in the determination of structures from xray crystallography and from NMR experiments.
The goal of this course is to provide an overview of the theoretical foundations of classical molecular dynamics simulations, to discuss some practical aspects of the method and to provide several specific applications within the framework of the CHARMM program. Although the applications will be presented in the framework of the CHARMM program, the concepts are general and applied by a number of different molecular dynamics simulation programs. The CHARMM program is a research program developed at Harvard University for the energy minimization and dynamics simulation of proteins, nucleic acids and lipids in vacuum, solution or crystal environments (Harvard CHARMM Web Page http://yuri.harvard.edu/). Section I of this course will focus on the fundamental theory followed by a brief discussion of classical mechanics. In section II, the potential energy function and some related topics will be presented. Section III will discuss some practical aspects of molecular dynamics simulations and some basic analysis. The remaining sections will present the CHARMM program and provide some tutorials to introduce the user to the program. This course will concentrate on the classical simulation methods (i.e., the most common) that have contributed significantly to our understanding of biological systems.
and provide the mean to carry out the following studies,
2. Historical BackgroundThe molecular dynamics method was first introduced by Alder and Wainwright in the late 1950's (Alder and Wainwright, 1957,1959) to study the interactions of hard spheres. Many important insights concerning the behavior of simple liquids emerged from their studies. The next major advance was in 1964, when Rahman carried out the first simulation using a realistic potential for liquid argon (Rahman, 1964). The first molecular dynamics simulation of a realistic system was done by Rahman and Stillinger in their simulation of liquid water in 1974 (Stillinger and Rahman, 1974). The first protein simulations appeared in 1977 with the simulation of the bovine pancreatic trypsin inhibitor (BPTI) (McCammon, et al, 1977). Today in the literature, one routinely finds molecular dynamics simulations of solvated proteins, proteinDNA complexes as well as lipid systems addressing a variety of issues including the thermodynamics of ligand binding and the folding of small proteins. The number of simulation techniques has greatly expanded; there exist now many specialized techniques for particular problems, including mixed quantum mechanical  classical simulations, that are being employed to study enzymatic reactions in the context of the full protein. Molecular dynamics simulation techniques are widely used in experimental procedures such as Xray crystallography and NMR structure determination.
Alder, B. J. and Wainwright, T. E. J. Chem. Phys. 27, 1208 (1957) Alder, B. J. and Wainwright, T. E. J. Chem. Phys. 31, 459 (1959) Rahman, A. Phys. Rev. A136, 405 (1964) Stillinger, F. H. and Rahman, A. J. Chem. Phys. 60, 1545 (1974) McCammon, J. A., Gelin, B. R., and Karplus, M. Nature (Lond.) 267, 585 (1977)
3. Statistical MechanicsMolecular dynamics simulations generate information at the microscopic level, including atomic positions and velocities. The conversion of this microscopic information to macroscopic observables such as pressure, energy, heat capacities, etc., requires statistical mechanics. Statistical mechanics is fundamental to the study of biological systems by molecular dynamics simulation. In this section, we provide a brief overview of some main topics. For more detailed information, refer to the numerous excellent books available on the subject. Introduction to Statistical Mechanics:In a molecular dynamics simulation, one often wishes to explore the macroscopic properties of a system through microscopic simulations, for example, to calculate changes in the binding free energy of a particular drug candidate, or to examine the energetics and mechanisms of conformational change. The connection between microscopic simulations and macroscopic properties is made via statistical mechanics which provides the rigorous mathematical expressions that relate macroscopic properties to the distribution and motion of the atoms and molecules of the Nbody system; molecular dynamics simulations provide the means to solve the equation of motion of the particles and evaluate these mathematical formulas. With molecular dynamics simulations, one can study both thermodynamic properties and/or time dependent (kinetic) phenomenon.
D. McQuarrie, Statistical Mechanics (Harper & Row, New York, 1976) D. Chandler, Introduction to Modern Statistical Mechanics (Oxford University Press, New York, 1987) R. E. Wilde and S. Singh, Statistical Mechanics, Fundamentals and Modern Applications (John Wiley & Sons, Inc, New York, 1998)
Statistical mechanics is the branch of physical sciences that studies macroscopic systems from a molecular point of view. The goal is to understand and to predict macroscopic phenomena from the properties of individual molecules making up the system. The system could range from a collection of solvent molecules to a solvated proteinDNA complex. In order to connect the macroscopic system to the microscopic system, time independent statistical averages are often introduced. We start this discussion by introducing a few definitions. Definitions The thermodynamic state of a system is usually defined by a small set of parameters, for example, the temperature, T, the pressure, P, and the number of particles, N. Other thermodynamic properties may be derived from the equations of state and other fundamental thermodynamic equations. The mechanical or microscopic state of a system is defined by the atomic positions, q, and momenta, p; these can also be considered as coordinates in a multidimensional space called phase space. For a system of N particles, this space has 6N dimensions. A single point in phase space, denoted by G, describes the state of the system. An ensemble is a collection of points in phase space satisfying the conditions of a particular thermodynamic state. A molecular dynamics simulations generates a sequence of points in phase space as a function of time; these points belong to the same ensemble, and they correspond to the different conformations of the system and their respective momenta. Several different ensembles are described below.
An ensemble is a collection of all possible systems which have different microscopic states but have an identical macroscopic or thermodynamic state. There exist different ensembles with different characteristics.

The Ergodic hypothesis states
Ensemble average = Time average 
The basic idea is that if one allows the system to evolve in time indefinitely, that system will eventually pass through all possible states. One goal, therefore, of a molecular dynamics simulation is to generate enough representative conformations such that this equality is satisfied. If this is the case, experimentally relevant information concerning structural, dynamic and thermodynamic properties may then be calculated using a feasible amount of computer resources. Because the simulations are of fixed duration, one must be certain to sample a sufficient amount of phase space.
Some examples of time averages:
where M is the number of configurations in the molecular dynamics trajectory and Vi is the potential energy of each configuration.
where M is the number of configurations in the simulation, N is the number of atoms in the system, mi is the mass of the particle i and vi is the velocity of particle i.
A molecular dynamics simulation must be sufficiently long so that enough representative conformations have been sampled.
The molecular dynamics simulation method is based on Newton’s second law or the equation of motion, F=ma, where F is the force exerted on the particle, m is its mass and a is its acceleration. From a knowledge of the force on each atom, it is possible to determine the acceleration of each atom in the system. Integration of the equations of motion then yields a trajectory that describes the positions, velocities and accelerations of the particles as they vary with time. From this trajectory, the average values of properties can be determined. The method is deterministic; once the positions and velocities of each atom are known, the state of the system can be predicted at any time in the future or the past. Molecular dynamics simulations can be time consuming and computationally expensive. However, computers are getting faster and cheaper. Simulations of solvated proteins are calculated up to the nanosecond time scale, however, simulations into the millisecond regime have been reported.
where F_{i} is the force exerted on particle i, m_{i }is the mass of particle i and a_{i }is the acceleration of particle i. The force can also be expressed as the gradient of the potential energy,
Combining these two equations yields
where V is the potential energy of the system. Newton’s equation of motion can then relate the derivative of the potential energy to the changes in position as a function of time.
Taking the simple case where the acceleration is constant,
we obtain an expression for the velocity after integration
and since
we can once again integrate to obtain
Combining this equation with the expression for the velocity, we obtain the following relation which gives the value of x at time t as a function of the acceleration, a, the initial position, x_{0} , and the initial velocity, v_{0}..
The acceleration is given as the derivative of the potential energy with respect to the position, r,
Therefore, to calculate a trajectory, one only needs the initial positions of the atoms, an initial distribution of velocities and the acceleration, which is determined by the gradient of the potential energy function. The equations of motion are deterministic, e.g., the positions and the velocities at time zero determine the positions and velocities at all other times, t. The initial positions can be obtained from experimental structures, such as the xray crystal structure of the protein or the solution structure determined by NMR spectroscopy.
The initial distribution of velocities are usually determined from a random distribution with the magnitudes conforming to the required temperature and corrected so there is no overall momentum, i.e.,
The velocities, vi, are often chosen randomly from a MaxwellBoltzmann or Gaussian distribution at a given temperature, which gives the probability that an atom i has a velocity v_{x }in the x direction at a temperature T.
The temperature can be calculated from the velocities using the relation
where N is the number of atoms in the system.
The potential energy is a function of the atomic positions (3N) of all the atoms in the system. Due to the complicated nature of this function, there is no analytical solution to the equations of motion; they must be solved numerically.
Numerous numerical algorithms have been developed for integrating the equations of motion. We list several here.
Important: In choosing which algorithm to use, one should consider the following criteria:
All the integration algorithms assume the positions, velocities and accelerations can be approximated by a Taylor series expansion:
Where r is the position, v is the velocity (the first derivative with respect to time), a is the acceleration (the second derivative with respect to time), etc.
To derive the Verlet algorithm one can write
Summing these two equations, one obtains
The Verlet algorithm uses positions and accelerations at time t and the positions from time tdt to calculate new positions at time t+dt. The Verlet algorithm uses no explicit velocities. The advantages of the Verlet algorithm are, i) it is straightforward, and ii) the storage requirements are modest . The disadvantage is that the algorithm is of moderate precision.
In this algorithm, the velocities are first calculated at time t+1/2dt; these are used to calculate the positions, r, at time t+dt. In this way, the velocities leap over the positions, then the positions leap over the velocities. The advantage of this algorithm is that the velocities are explicitly calculated, however, the disadvantage is that they are not calculated at the same time as the positions. The velocities at time t can be approximated by the relationship:
This algorithm yields positions, velocities and accelerations at time t. There is no compromise on precision.
This algorithm is closely related to the Verlet algorithm
The advantage of this algorithm is that it provides a more accurate expression for the velocities and better energy conservation. The disadvantage is that the more complex expressions make the calculation more expensive.