Calculation of
        F-statistics

F-Statistics as measures of genetic population structure
a numerical example

    Previously, we used F to measure the deficiency of heterozygotes due to (1) mating of closely-related individuals (inbreeding) within local populations, and (2) mixing of population data between two local populations (Wahlund Effect). We can generalize the latter to estimate geographic population structure. Suppose a global species' population occurs as a series of local populations with different allele frequencies. If these local populations are separated geographically and do not exchange individuals uniformly, then individuals are more likely to mate with closer neighbors in the same local population. Local populations are then more likely to comprise related individuals, so geographic structure further increases non-random mating of relatives. The same occurs if different local populations are separated by larger or smaller geographic distances.

    Consider individuals distributed among three sub-populations of a global population, with observed genotype counts for each sub-population as indicated in the grey box. Based on these counts, we wish to compare the expected versus observed heterozygosity at each of these three levels of population structure. The calculations of F-statistics are more easily appreciated if N is equal across sub-populations, otherwise the calculations become complicated because their contributions must be weighted by their size. 

    For each sub-population, observed allele counts #A & #a and frequencies f(A) & f(a) are easily calculated in the usual manner. Expected genotype frequencies are also easily calculated from the observed allele frequencies, for example
Hexp = (2)(fA)(fa). Global fA () is the mean of the observed f(A) over all sub-populations, and global fa () = (1 - fA).

    Heterozygosity indices Hi, Hs, and Ht are simply H, calculated at different levels of the population structure. With equal N, these are easily calculated from the bold values in the table above, as


    Hi
= mean of observed f(Aa) = (0.432 + 0.378 + 0.288) / 3 = 0.3660
            This is the observed probability of heterozygosity for an individual drawn at random from any sub-population.


    Hs = mean of expected f(Aa) =  (0.480 + 0.420 + 0.320) / 3 = 0.4067
            This is the expectation of heterozygosity if the individual were drawn at random from any sub-population.
            H
i and Hs differ when sub-populations have different genetic structures. This is what we are looking for in the analysis of genetic population structure.

    Ht
Expected
"global" heterozygosity is calculated as = (0.6 + 0.7 + 0.8) / 3 = 0.7, = (1.0 - 0.7) = 0.3, and thus (2)(0.7)(0.3) = 0.4200
             This is simply the global (total) expectation of heterozygosity based on the observed total allele frequencies.

Note that Ht > Hs > Hi : there is a deficiency of heterozygotes with respect to the expected total expected, both among and within sub-populations.

    This deficiency can be expressed as a set of three F statistics, which are hierarchical versions of the same concept.
This is most obvious when N and local F are constant. Recall that F = (Hexp - Hobs) / Hexp = 1 - (Hobs / (Hexp).  

     Fis = mean deficiency of observed heterozygotes among individuals with respect to that expected across sub-populations.
                In this example, where local F is the same across sub-populations, Fis is equivalent to local F.
    
Fit = mean deficiency of observed heterozygotes among individuals with respect to that expected for the total population,
                which in equivalent to Wahlund Effect, when allele frequencies differ across sub-populations.
     Fst = mean deficiency of expected heterozygotes among sub-populations with respect to that expected for the total population,
                which in this case is a measure of population differentiation within the total.
               
    F
st in various forms is the most widely used descriptor of population genetic structure with diploid data (nuclear DNA sequences, or allozymes). The concept can be extended to multiple sub-populations within a population, or multiple population levels within species. Equivalent measures can be calculated for haploid data (mtDNA).

HOMEWORK: Two ways of calculating FST are shown, in terms of FIT & FIS  or HT & HS. SHOW that the two calculations are equivalent.
 
    Heterozygosity and F-statistics can also be thought of as random-draw-with-replacement experiments.
 
    Draw an allele at random from any particular sub-population, and replace it. What is the expectation that a second allele drawn from the same sub-population will be different?
That is, what is the chance of drawing a heterozygous pair? What is the expectation of heterozygosity if the two alleles are drawn from different sub-populations? Genetic structure will always reduce the expectation calculated from the global allele frequency, if different sub-populations exhibit different degrees of inbreeding and (or) differences in allele frequencies.

    As a non-genetic thought experiment demonstration: Consider three bags, each with a set of black or white marbles, where the ratio of B:W in each bag is unknown.
    a) Starting with Bag #1, draw a marble, note the color, replace it in the bag, and draw a second marble, note the color so that the pair is BB BW or WW. Repeat 100 times.
Do the same with Bags ## 2 & 3.This estimates the ratio in each bag [sub-population], and gives an observed count of BW [heterozygotes] within each bag.
    b) Draw a marble from Bag #1 and a marble from Bag #2. Note the color combination. Replace the marbles in their bags. Repeat 100 times for these two bags. Repeat the same process for the other two pairwise combinations of bags, #1 & #3 and #2 & #3. This gives an observed count of BW heterozyotes between bags [sub-populations].
    c) Dump all the marbles into one bag  Repeat the same draw & replace experimental in (a). This gives an observed count of BW heterozygotes across all bags [global].

 
  
This thought experiment differs from the genetic experiment, because marbles in each bag do not occur in pairs [diploids]. In this special case, once you have estimated the B:W ratio within each bag, you should be able to predict the expectation of B:W for any pairwise combination of bags, and for the total combined bags.


Figures & Text material © 2024 by Steven M. Carr