Calculation
of the SFS and folded SFS for n
= 10 haploid individuals. With 10 positions,
there are n-1 = 9 classes, because
any SNP
variant
could occur in 1/10, 2/10, 3/10, ..., or 9/10
individuals: a base that occurs in 10/10
individuals would be invariant. The first matrix
shows the actual DNA sequence data. The
second matrix re-codes the first with a '0'
where the based matches the first individual and
"I" where there is a SNP difference. The
third matrix shows only the SNPs, and
counts the differences with respect to the first
individual. [Note that the first individual
necessarily shows all '0's].
The SFS matrix
counts the number of derived SNP classes
in the third matrix: there are three '1's,
three '2's, two '3's and so on.
Because it cannot be determined whether the
character state in Individual I is
actually ancestral or derived, the Folded SFS matrix combines
the "1" and "5" classes (both of
which have one character one way and five
the other), and the "2" and "4"
classes (both have two one way and three the
other). The '3' class remains unchanged
(it combines the three one way, three the other
types). Then, the total number of SNP differences
needed to explain the DNA data matrix
is (1 x 4) + (2 x 4) + (3 x
2) = 18.