JC69 vs HKY85 likelihood models

Models of Molecular Evolution

    Given four nucleotides A, C, G, & T, there are (4x4) - 4 = 12 possible pairwise mutations among them that result in a SNP. Mutations rates among the four nucleotides can be set in various ways, based on data and assumptions.

    [Left] The original and simplest model is the Jukes & Cantor (1969) model, called JC69, which assumes that all nucleotide frequencies are equal, and all mutations leading to a SNP occur at the same rate, m. For example, the reciprocal rates AC and CA are equal, and equal to AG. At the time it was introduced, there were few data to suggest otherwise, and more important, the model was mathematically and computationally simple prior to the days of PCs.

    [Right] Given the availability of data and increased understanding of molecular evolution, it became apparent that nucleotide frequencies in any one DNA strand are unequal, that mutation rates of each nucleotide are unequal, and in particular that transitions (AG and CT) are much more frequent than transversions (all other pairwise mutations) by a factor K. The Hasegawa, Kishino, & Yano (1985) model, call HKY85, incorporates all these factors. In the last column, for example, the mutation rate of any nucleotide A, C, or G to T is the same (πT), except that the rate of transitions CT is weighted by K. Every other column has the same arrangement.

    [Below] As computational power and extensive data became available, it is now possible to construct a universal model, called the General Time Reversible (GTR) model, which allows all available information to be incorporated into any particular evolutionary investigation. Estimates of mutation rates are calculated from the data themselves. In the last column, for example, πγπT in the HKY95 model is weighted by three different nucleotide-specific factors, ϒϒ, ϵ, &ϵ ηη, where ηη incorporates the transition bias. The probability that T remains unchanged is also explicitly calculated, as the negation of the sum of probabilities in the last row that it does change.

{\displaystyle Q={\begin{pmatrix}{-(\alpha \pi _{G}+\beta
        \pi _{C}+\gamma \pi _{T})}&{\alpha \pi _{G}}&{\beta \pi
        _{C}}&{\gamma \pi _{T}}\\{\alpha \pi _{A}}&{-(\alpha \pi
        _{A}+\delta \pi _{C}+\epsilon \pi _{T})}&{\delta \pi
        _{C}}&{\epsilon \pi _{T}}\\{\beta \pi _{A}}&{\delta \pi
        _{G}}&{-(\beta \pi _{A}+\delta \pi _{G}+\eta \pi
        _{T})}&{\eta \pi _{T}}\\{\gamma \pi _{A}}&{\epsilon \pi
        _{G}}&{\eta \pi _{C}}&{-(\gamma \pi _{A}+\epsilon \pi
        _{G}+\eta \pi _{C})}\end{pmatrix}}}

where there are six distinct reciprocal mutation rates:

{\displaystyle {\begin{aligned}\alpha =r(A\rightarrow
        G)=r(G\rightarrow A)\\\beta =r(A\rightarrow C)=r(C\rightarrow
        A)\\\gamma =r(A\rightarrow T)=r(T\rightarrow A)\\\delta
        =r(G\rightarrow C)=r(C\rightarrow G)\\\epsilon =r(G\rightarrow
        T)=r(T\rightarrow G)\\\eta =r(C\rightarrow T)=r(T\rightarrow
        C)\end{aligned}}}


Text material © 2024 by Steven M. Carr