Sample DNA data
set
The top
panel shows a typical set of 10 bp DNA sequences
from five individuals in a population. DNA sequence
variants occur at three positions, called Single Nucleotide
Polymorphisms (SNPs) or Segregating Sites,
at positions 2, 4, & 9, flagged here by dots (.) below the panel. The
middle panel re-codes the DNA sequences
in binary form, in each case taking the state in
Sequence #1 as 0 and any SNP as
1. We will assume for the moment that all SNP
changes are from 0 1, and we
call any 0 the ancestral state
and any 1 the derived state. The
bottom panel extracts the binary codes for the
three SNPs for ease of comparison. Note
that there are three alleles (haplotypes) among
the five individuals: 000 in #1, 011
in ##2 & 3, and 101 in ## 4 & 5. [Note that the
three SNPs all involve transversions
(alternative purine /
pyrimidine bases): t/g,
c/a, & a/t].
Similar results can be obtained simply by
considering the DNA sequence data
directly and counting "1" for each
difference between any pair of sequences. Thus for
Sequences ##1 & 2, the difference d = 1 +
1 = 2 for the c/a difference at
position 4 and the a/t difference at
position 9. This does not require re-coding of the
data.
Figure &Text material © 2022 by Steven M. Carr