
 The Site
                  Frequency Spectrum (SFS)
                as a description of allelic variation
      
      The Site
                  Frequency Spectrum (SFS)
                as a description of allelic variation
      
       
                  
        The sample data set
                      shows 10bp sequences from five
                      individuals, including three that show SNP variability.
                      The binary-coded
                          data matrix [left] shows the
                      occurrence of these three variable sites among
                      five individuals, coded as 0 or 1,
                      to indicate whether each SNP site in each
                      individual has the same base as Individual 1
                      (hence ancestral) or not (hence derived).
                      These data define three alleles
                      (haplotypes), found in individuals ## (1), (2
                      & 3), and (4 & 5). Recall the
                                  assumption that Individual 1
                                  shows the ancestral state
                                  (0) at all three positions, and
                                  various of the other individuals the derived
                                  state (1).
                      The derived forms of the three 
                        SNP sites occurs in 2/5, 2/5, and 4/5
                      individuals, respectively, as shown by shading. 
                      
                      With n individuals, the derived allele
                      could occur in 1, 2, 3, ... (n-1)
                      individuals, which establishes (n-1)
                      classes, in this case 5 - 1 = 4 classes as
                      shown. [They are called "allele frequencies"
                      because they correspond to SNP variants in
                      20%, 40%, 60%, or 80% of the individuals]. The SFS
                      graph summarizes the allele frequency data
                      2/5, 2/5, and 4/5 by showing that 2/3 = 67% of the
                       SNPs occur in two individuals,
                      and that 1/3 = 33% occurs in one individual. Note
                      that the data show no cases where the derived
                      allele is present in only 1 or in 3 individuals.
                      
                      Advanced: The
                      SFS can also be written for n individuals
                      as a vector f
                      = (f1, f2, ..., fn-1),
                      in this case   f =
                      (0.0, 0.67, 0.0, 0.33). If the ancestral and
                      derived SNPs at each position are not
                      known, we can write a folded frequency
                        spectrum vector, by combining ("folding")
                      sites with 10000 + 01111 and 11000 +
                                  00111 SNP
                      distributions ( "1+4" and "2+3"
                      patterns). In this case, the folded vector is f* =
                      (f1, f2) =
                      (0.33, 0.67). Observe that if there were n = 6
                      individuals, and there were a fourth site coded
                      from top to bottom as 000111 or 111000
                      ("3+3" pattern), it would be possible to write a
                      folded vector f*
                                              = (f1, f2,
                                                f3) =
                                              (0.25, 0.50, 0.25).
                        
                      For n=5 as in the example, the notation
                      for a folded vector doesn't define an f*3
                                  term
                      if the series is limited to
                    f*j
                                  <
                      n/2 < 5/2 = 2.5. In English, any odd
                      number 2n+1 has the same number of allele
                      frequency classes as the next lower even number 2n. 
              
       
      
Figure © 2013 by
          Sinauer; Text material © 2017 by Steven M. Carr