ABO calculations

Estimation of allele frequencies from genotype data, with multiple alleles and dominance

    Estimation of allele frequencies for a locus with two co-dominant alleles is straightforward from basic algebra and population genetics principles. For example the MN blood system with two alleles M & N and three genotypes MM, MN, and NN corresponding to three phenotypes M, MN, and N.

    Estimation of allele frequencies for a locus with multiple alleles & dominance is more complicated. The ABO blood group system is a good example. There are three alleles (A, B, & O) that give rise to six genotypes (AA, AO, BB, BO, OO, & AB) that determine four blood group phenotypes (A, B, AB, & O). Alleles A & B are dominant to O: AA & AO are both type A, and BB & BO are both type B. Population genetic data are typically reported as the observed frequencies or counts of phenotypes, based on the agglutination test. The task is to estimate the allele frequencies from the data, so as to generate the expected frequencies and counts of phenotypes. Observed and expected data can then be compared. However, the ABO system is under-determined: an exact algebraic solution of n variables from n-1 quantities cannot be obtained.

    We use instead an approximate solution based on a Likelihood approach, with successive corrections. Likelihood methods use observed data or informed predictions to make or modify an a priori expectation.

Let

HOMEWORK: Calculate a Chi-Square analysis of the difference between the Observed vs Expected ("Reconstructed") counts, based on n = 163. Does the population show expected Hardy-Weinberg proportions? Would it makes a difference if n = 1630, with the same proportions?