F-Statistics as measures of genetic population structure
a numerical example
Previously, we
used F
to measure the
deficiency
of
heterozygotes
due to (1) mating
of
closely-related
individuals
(inbreeding)
within local
populations,
and (2) mixing
of population
data between
two local
populations (Wahlund
Effect).
We can
generalize the
latter to
estimate geographic
population
structure.
Suppose a global
species'
population
occurs as a
series of local
populations
with different
allele
frequencies.
If these local
populations
are separated
geographically
and do not
exchange
individuals
uniformly,
then
individuals
are more
likely to mate
with closer
neighbors in
the same local
population.
Local
populations
are then more
likely to
comprise
related
individuals,
so geographic
structure
further
increases
non-random
mating of
relatives. The
same occurs if
different
local
populations
are separated
by larger or
smaller
geographic
distances.
Consider individuals
distributed
among three
sub-populations
of a global
population,
with observed
genotype
counts for
each
sub-population
as indicated
in the grey
box. Based on
these counts,
we wish to
compare the expected
versus
observed heterozygosity
at each of
these three
levels of
population
structure. The
calculations
of F-statistics
are more
easily
appreciated if
N is equal
across
sub-populations,
otherwise the
calculations
become
complicated
because their
contributions
must be
weighted by
their
size.
For each sub-population, observed
allele counts #A & #a and frequencies
f(A) & f(a) are easily calculated in
the usual manner. Expected genotype
frequencies are also easily calculated from the observed
allele frequencies, for example Hexp
=
(2)(fA)(fa).
Global
fA
() is the mean of
the observed f(A)
over
all sub-populations,
and global
fa
() =
(1 - fA).
Heterozygosity
indices Hi, Hs,
and Ht are simply H,
calculated at different levels of the
population structure. With equal N,
these are easily calculated from the bold
values in the table above, as
Hi =
mean of observed f(Aa)
= (0.432 + 0.378 + 0.288) / 3 = 0.3660
This is the observed
probability of
heterozygosity for an individual
drawn at random from any sub-population.
Hs
=
mean of expected
f(Aa) =
(0.480 + 0.420
+ 0.320) / 3 =
0.4067
This is the expectation
of
heterozygosity
if the
individual
were drawn at
random from
any sub-population.
Hi
and Hs
differ when sub-populations
have different
genetic
structures.
This is
what we are
looking for
in the
analysis of
genetic
population
structure.
Ht
=
Expected "global"
heterozygosity
is calculated as
=
(0.6 + 0.7 +
0.8) / 3 =
0.7, =
(1.0 - 0.7)
= 0.3,
and thus (2)(0.7)(0.3) = 0.4200
This is simply the global (total)
expectation of heterozygosity based
on the observed total allele
frequencies.
Note
that Ht
> Hs
> Hi :
there is a deficiency of
heterozygotes with respect to
the expected total expected,
both among and within sub-populations.
This deficiency can
be expressed as a set of three F statistics,
which are hierarchical versions
of the same concept. This
is most
obvious when N
and local
F
are
constant. Recall
that F
= (Hexp
-
Hobs)
/ Hexp
= 1 -
(Hobs
/ (Hexp).
Fis
= mean deficiency of observed
heterozygotes among individuals
with respect to that expected
across sub-populations.
In this example, where local F is
the same across sub-populations, Fis
is equivalent to local F.
Fit
= mean deficiency of observed
heterozygotes among individuals
with respect to that expected
for the total population,
which in equivalent to Wahlund
Effect, when allele frequencies
differ across sub-populations.
Fst
= mean deficiency of expected
heterozygotes among sub-populations
with respect to that expected for
the total population,
which in this case is a measure of population
differentiation within the
total.
Fst
in various forms is the most
widely used descriptor of population
genetic structure with diploid data (nuclear
DNA sequences, or
allozymes). The concept can be
extended to multiple sub-populations
within a population, or multiple
population levels within species.
Equivalent measures can be calculated
for haploid data (mtDNA).
HOMEWORK: Two ways of calculating FST
are shown, in
terms of FIT
& FIS
or HT
& HS.
SHOW that
the two
calculations
are
equivalent.
Heterozygosity and
F-statistics can also be
thought of as
random-draw-with-replacement
experiments.
Draw an allele
at random from
any particular
sub-population,
and replace
it. What is
the
expectation
that a second
allele
drawn from the
same sub-population
will be different?
That
is, what is
the chance of
drawing a heterozygous
pair? What
is the
expectation of
heterozygosity
if the two
alleles are
drawn from different
sub-populations?
Genetic
structure will
always reduce
the
expectation
calculated
from the
global allele
frequency, if
different
sub-populations
exhibit
different
degrees of
inbreeding and
(or)
differences in
allele
frequencies.
As a non-genetic thought experiment
demonstration: Consider three bags,
each with a set of black or white
marbles, where the ratio of B:W
in each bag is unknown.
a) Starting with Bag
#1, draw a marble, note the
color, replace it in the bag, and draw
a second marble, note the color so
that the pair is BB BW or
WW. Repeat 100 times. Do
the same with
Bags ## 2
& 3.This
estimates the ratio in each bag
[sub-population], and gives an
observed count of BW [heterozygotes]
within each bag.
b) Draw a marble from Bag
#1 and a marble from Bag
#2. Note the color
combination. Replace the marbles in
their bags. Repeat 100 times for these
two bags. Repeat the same process for
the other two pairwise combinations of
bags, #1 & #3
and #2 & #3. This
gives an observed count of BW heterozyotes
between bags
[sub-populations].
c) Dump all the
marbles into one bag Repeat the
same draw & replace experimental
in (a). This gives an observed count
of BW heterozygotes across
all bags [global].
This
thought
experiment
differs from
the genetic
experiment,
because
marbles in
each bag do
not occur in
pairs
[diploids].
In this special case, once you have
estimated the B:W ratio
within each bag, you should be
able to predict the expectation of B:W
for any pairwise combination
of bags, and for the total
combined bags.
Figures &
Text material © 2024 by Steven M. Carr