JC69 K2P HKY85

Models of Molecular Evolution

    Given four nucleotides A, C, G, & T, there are (4x4) - 4 = 12 possible pairwise mutations among them that result in a SNP. Mutations rates among the four nucleotides can be set in various ways, based on data and assumptions.

    [Left] The original and simplest model is the Jukes & Cantor (1969) model, called JC69, which assumes that all nucleotide frequencies are equal, and all mutations leading to a SNP occur at the same rate, m. For example, the reciprocal rates AC and CA are equal, and equal to AG. At the time it was introduced, there were few data to suggest otherwise, and more important, the model was mathematically and computationally simple prior to the days of PCs.

    [Middle] A simple adjustment is the Kimura Two-Parameter Model (K2P), which recognized from early DNA data that, at least in comparisons within species and among closely-related species, transitions (Ts) (AG and CT) are much more frequent than transversions (Tv) (all other pairwise mutations). This is calculated as the Transition Bias (K)   [Ts] / [Tv]. As sequence divergences increase, the number of observed Transversions increases, to a point where Tv-only models may be more accurate.

    [Right] Given the availability of data and increased understanding of molecular evolution, it became apparent that besides the Transition Bias, nucleotide frequencies in any one DNA strand are unequal, and that mutation rates between nucleotide pairs are unequal. The Hasegawa, Kishino, & Yano (1985) model (HKY85) incorporates all these factors. In the last column, for example, the mutation rate of any nucleotide A, C, or G to T is the same (πT), except that the rate of transitions CT is weighted by K as in the previous mode. Every other column has the same arrangement.

    [Below] For the advanced student: As computational power and extensive data became available, it is now possible to construct a universal model, called the General Time Reversible (GTR) model, which allows all available information to be incorporated into any particular evolutionary investigation. Estimates of mutation rates are calculated from the data themselves. In the last column, for example, πγ in the HKY95 model is weighted by three different nucleotide-specific factors, ϒ, ϵ, &ϵ η, where η incorporates the Transition Bias. The probability that T remains unchanged is also explicitly calculated, as the negation of the sum of probabilities in the last row that it does change. The GTR was previously prohibitively machine-time intensive, and remains so for large analyses with bootstrap re-sampling methods.

{\displaystyle Q={\begin{pmatrix}{-(\alpha \pi _{G}+\beta
        \pi _{C}+\gamma \pi _{T})}&{\alpha \pi _{G}}&{\beta \pi
        _{C}}&{\gamma \pi _{T}}\\{\alpha \pi _{A}}&{-(\alpha \pi
        _{A}+\delta \pi _{C}+\epsilon \pi _{T})}&{\delta \pi
        _{C}}&{\epsilon \pi _{T}}\\{\beta \pi _{A}}&{\delta \pi
        _{G}}&{-(\beta \pi _{A}+\delta \pi _{G}+\eta \pi
        _{T})}&{\eta \pi _{T}}\\{\gamma \pi _{A}}&{\epsilon \pi
        _{G}}&{\eta \pi _{C}}&{-(\gamma \pi _{A}+\epsilon \pi
        _{G}+\eta \pi _{C})}\end{pmatrix}}}

where there are six distinct reciprocal mutation rates:

{\displaystyle {\begin{aligned}\alpha =r(A\rightarrow
        G)=r(G\rightarrow A)\\\beta =r(A\rightarrow C)=r(C\rightarrow
        A)\\\gamma =r(A\rightarrow T)=r(T\rightarrow A)\\\delta
        =r(G\rightarrow C)=r(C\rightarrow G)\\\epsilon =r(G\rightarrow
        T)=r(T\rightarrow G)\\\eta =r(C\rightarrow T)=r(T\rightarrow
        C)\end{aligned}}}


Text material © 2024 by Steven M. Carr