F-Statistics as measures of genetic population structure
A numerical example
Previously, we
used F
to measure the
deficiency
of
heterozygotes
due to (1) mating
of
closely-related
individuals
(inbreeding)
within local
populations,
and (2) mixing
of population
data among
local
populations (Wahlund
Effect).
We can
generalize the
concept to
estimate geographic
population
structure.
Suppose a global
species'
population
occurs as a
series of local
populations
with different
allele
frequencies.
If these local
populations
are separated
geographically
and do not
exchange at
uniform rates
individuals,
then
individuals
are more
likely to mate
with neighbors
in the same
local
population.
Local
populations
are then more
likely to
comprise
related individuals,
so geographic
structure
further
increases
non-random
mating of
relatives. The
effect is
greater if
different
local
populations
are separated
by larger or
smaller
geographic
distances.
Heterozygosity
and F-statistics
can also be
thought of as
random-draw-with-replacement
experiments. Draw
an allele at
random from
any particular
sub-population,
note
and replace
it. What is
the
expectation
that a second
allele
drawn from the
same sub-population
will be different?
That
is, what is
the chance of
drawing a heterozygous
pair? What
is the
expectation of
heterozygosity
if the second
allele is
drawn from a different
sub-population?
What if this
experiment is
repeated with
alleles drawn
from different
pairs of
sub-populations?
From all
possible pairs
of
sub-populations?
Expectations
differ for
each of these
experiments,
according to
the
heterogeneity
of allele and
genotype
frequencies H
among
populations.
Genetic
structure will
always reduce
the
expectation of
heterozygosity
as calculated
from the
global allele
frequency: the
deficiency is
expressed as F.
Consider a
simple model
of individuals
distributed
among three
sub-populations
of a global
population,
with observed
genotype
counts for
each
sub-population
as indicated
in the grey
box. Based on
these counts,
we wish to
compare the expected
vs
observed heterozygosity
at each of
these three
levels of
population
structure. The
population
sizes N of
the three
sub-populations
are equal.
This
simplifies the
calculations
of F-statistics,
otherwise the
calculations
become
complicated
because
contributions
from
sub-population
of different
size must be
weighted by
sample size.
The observed data in the table are
genotype counts for three populations at one locus. For
each sub-population, observed allele counts
#A & #a and frequencies f(A)
& f(a) are easily calculated in the usual
manner. Expected genotype counts &
frequencies are also easily calculated from the observed
allele frequencies, for example Hexp
=
(2)(fA)(fa).
Global
fA
() is the mean of
the observed f(A)
over
all sub-populations,
and global
fa
() =
(1 - fA).
Heterozygosity indices Hi,
Hs, and Ht
are simple H, calculated at
different levels of the population
structure. With equal N,
these are easily calculated from the bold
values in the table above, as
Hi =
mean of observed f(Aa)
= (0.432 + 0.378 + 0.288) / 3 = 0.3660
This is the observed
probability of
heterozygosity for an individual
drawn at random from any sub-population.
Hs
=
mean of expected
f(Aa) =
(0.480 + 0.420
+ 0.320) / 3 =
0.4067
This is the expectation
of
heterozygosity
for two
alleles drawn
at random from
any pair of
sub-population.
Hi
and Hs
differ when sub-populations
have different
genetic
structures.
The
difference is
a measure of
genetic
population
structure.
Ht
=
Expected "global"
heterozygosity
is calculated as
= (0.6 +
0.7 + 0.8) / 3
= 0.7,
=
(1.0 - 0.7)
= 0.3,
and thus (2)(0.7)(0.3) = 0.4200
This is simply the global (total)
expectation of heterozygosity based
on the observed total allele
frequencies.
Note
that Hi
> HS
> Ht
: there is a deficiency of
heterozygotes with respect to
the expected total expected,
both among and within sub-populations.
This deficiency
can be expressed as a set of three
F-statistics,
which are hierarchical versions
of the same concept. Again,
these are easily calculated with equal
N across sub-populations. Recall that F
and H
are
related as F
= (He
-
Ho)
/ He
= 1 -
(Ho
/ (He).
The analogous
calculations
are shown in
the third box,
lower right:
Fis
= mean deficiency of observed
heterozygotes among individuals
with respect to that expected
across sub-populations.
In this example, where local F is
the same across sub-populations, Fis
is equivalent to local F.
Fit
= mean deficiency of observed
heterozygotes among individuals
with respect to that expected
for the total population,
which in equivalent to Wahlund
Effect, when allele frequencies
differ across sub-populations.
Fst
= mean deficiency of expected
heterozygotes among sub-populations
with respect to that expected for
the total population,
which in this case is a measure of population
differentiation among
sub-populations within the total.
Fst
in various forms is the most
widely used descriptor of population
genetic structure with diploid data (nuclear
DNA sequences, or
allozymes). The concept can be
extended to multiple sub-populations
within a population, or multiple
population levels within species.
Equivalent measures can be calculated
for haploid data (mtDNA).
HOMEWORK: Two ways of calculating FST
are shown, in
terms of FIT
& FIS
or HT
& HS.
SHOW that
the two
calculations
are
equivalent.
Figures &
Text material © 2024 by Steven M. Carr