Estimation of allele frequencies from genotype
data, with multiple alleles and dominance
Estimation of
allele frequencies for a locus with two co-dominant alleles, for
example the MN blood system with two alleles M
& N and three genotypes MM, MN,
and NN corresponding to three phenotypes M,
MN, and N, is straightforward from
basic population genetics principles.
Estimation of
allele frequencies for a locus with multiple alleles &
dominance is more complicated. The ABO blood
group system is a
good example. There are three alleles (A, B,
& O) that give rise to six genotypes (AA,
AO, BB, BO, OO, & AB)
that determine four blood group phenotypes (A,
B, AB, & O).
Alleles A & B are dominant to O:
AA & AO are both type A, and
BB & BO are both type B.
Population genetic data are typically reported as the observed
frequencies or counts of phenotypes,
based on the agglutination test. The task is
to estimate the allele frequencies from the
data, so as to generate the expected frequencies and
counts of phenotypes. Observed and expected data can then be
compared. However, basic algebra says that an exact solution
of n variables from n-1
quantities cannot be obtained (the system is under-determined).
We use instead an approximate solution
based on a Likelihood approach, with
successive corrections. Likelihood methods use observed data
or informed predictions to make or modify an a priori
expectation.
Let
HOMEWORK: Calculate a Chi-Square
analysis of the difference between the Observed vs
Expected ("Reconstructed") counts, based on n
= 163. Does the population show expected Hardy-Weinberg
proportions? Would it makes a difference if n = 1630, with the
same proportions?