F-Statistics
        calculation example

F-Statistics as measures of genetic population structure
 
A numerical example

    Previously, we used F to measure the deficiency of heterozygotes due to (1) mating of closely-related individuals (inbreeding) within local populations, and (2) mixing of population data among local populations (Wahlund Effect). We can generalize the concept to estimate geographic population structure. Suppose a global species' population occurs as a series of local populations with different allele frequencies. If these local populations are separated geographically and do not exchange at uniform rates individuals, then individuals are more likely to mate with neighbors in the same local population. Local populations are then more likely to comprise related individuals, so geographic structure further increases non-random mating of relatives. The effect is greater if different local populations are separated by larger or smaller geographic distances.

    Heterozygosity and F-statistics can also be thought of as random-draw-with-replacement experiments. Draw an allele at random from any particular sub-population, note and replace it. What is the expectation that a second allele drawn from the same sub-population will be different? That is, what is the chance of drawing a heterozygous pair? What is the expectation of heterozygosity if the second allele is drawn from a different sub-population? What if this experiment is repeated with alleles drawn from different pairs of sub-populations? From all possible pairs of sub-populations? Expectations differ for each of these experiments, according to the heterogeneity of allele and genotype frequencies H among populations. Genetic structure will always reduce the expectation of heterozygosity as calculated from the global allele frequency: the deficiency is expressed as F.

    Consider a simple model of individuals distributed among three sub-populations of a global population, with observed genotype counts for each sub-population as indicated in the grey box. Based on these counts, we wish to compare the expected vs observed heterozygosity at each of these three levels of population structure. The population sizes N of the three sub-populations are equal. This simplifies the calculations of F-statistics, otherwise the calculations become complicated because contributions from sub-population of different size must be weighted by sample size.

    The observed data in the table are genotype counts for three populations at one locus. For each sub-population, observed allele counts #A & #a and frequencies f(A) & f(a) are easily calculated in the usual manner. Expected genotype counts & frequencies are also easily calculated from the observed allele frequencies, for example
Hexp = (2)(fA)(fa). Global fA () is the mean of the observed f(A) over all sub-populations, and global fa () = (1 - fA).

    Heterozygosity indices Hi, Hs, and Ht are simple H, calculated at different levels of the population structure. With equal N, these are easily calculated from the bold values in the table above, as


    Hi
= mean of observed f(Aa) = (0.432 + 0.378 + 0.288) / 3 = 0.3660
            This is the observed probability of heterozygosity for an individual drawn at random from any sub-population.


    Hs = mean of expected f(Aa) =  (0.480 + 0.420 + 0.320) / 3 = 0.4067
            This is the expectation of heterozygosity for two alleles drawn at random from any pair of sub-population.
            H
i and Hs differ when sub-populations have different genetic structures. The difference is a measure of genetic population structure.

    Ht
Expected
"global" heterozygosity is calculated as = (0.6 + 0.7 + 0.8) / 3 = 0.7, = (1.0 - 0.7) = 0.3, and thus (2)(0.7)(0.3) = 0.4200
             This is simply the global (total) expectation of heterozygosity based on the observed total allele frequencies.

Note that
Hi > HS > Ht : there is a deficiency of heterozygotes with respect to the expected total expected, both among and within sub-populations.

    This deficiency can be expressed as a set of three F-statistics, which are hierarchical versions of the same concept.
Again, these are easily calculated with equal N across sub-populations. Recall that F and H are related as F = (He - Ho) / He = 1 - (Ho / (He).  The analogous calculations are shown in the third box, lower right:

     Fis = mean deficiency of observed heterozygotes among individuals with respect to that expected across sub-populations.
                In this example, where local F is the same across sub-populations, Fis is equivalent to local F.
    
Fit = mean deficiency of observed heterozygotes among individuals with respect to that expected for the total population,
                which in equivalent to Wahlund Effect, when allele frequencies differ across sub-populations.
     Fst = mean deficiency of expected heterozygotes among sub-populations with respect to that expected for the total population,
                which in this case is a measure of population differentiation among sub-populations within the total.
               
    F
st in various forms is the most widely used descriptor of population genetic structure with diploid data (nuclear DNA sequences, or allozymes). The concept can be extended to multiple sub-populations within a population, or multiple population levels within species. Equivalent measures can be calculated for haploid data (mtDNA).

HOMEWORK: Two ways of calculating FST are shown, in terms of FIT & FIS  or HT & HS. SHOW that the two calculations are equivalent.


Figures & Text material © 2024 by Steven M. Carr