Systematic
analysis: an example using molecular data
Why use molecules?
Molecules provide an independent
estimate of phylogeny
Avodis a circular argument:
Morphology
is used to create a classification,
then the classification is interpreted to explain evolution
Ex.: Chinese
water deer (Hydropotes) is the only antlerless deer
=> placed
in a separate subfamily
& assumed to be ancestral type
But (molecular) analysis shows antlers were lost secondarily
Molecules provide large numbers of characters
for analysis
Homo has ca.
200 bones and 3,000,000,000 nucleotide pairs
Typical
morphological study involves <100 characters
Typical
molecular study involves >1,000
Nucleotides at same locus
may evolve co-ordinately
Separate loci should be
independent
Patterns of molecular evolution are understood
Transitions (Ts)
are more frequent than Transversions (Tv)
[recall
3250: transitions = CT
or AG interchange,
transversions are everything else]
'silent' >> 'replacement'
substitutions
3rd position >> 2nd &
1st substitutions (usually)
Relative importance of characters is
easier to judge
Is the # of toes more
important than # of teeth?
Are scales versus
feathers more important than # of temporal openings?
But: Any one nucleotide
position is more or less like any other
1. Defining the problem:
Evolutionary relationships of the Giant
Panda (Ailuropoda)
Ailuropoda has been considered
to be either a bear (Ursidae) or a raccoon (Procyonidae)
General morphology suggests ursid ancestors:
Details of skull, diet,
biogeography suggest procyonid ancestors
Ex.:
alar canal is present in Ursidae (including Ailuropoda),
absent in Procyonidae (except lesser panda,
Ailurus)
2. Collecting the data:
Measure homologous characters in a set of taxa:
with DNA, each nucleotide position is a
separate character
mitochondrial DNA (mtDNA)
is used in many systematic studies
"Small circular molecule ...", 16Kbp,
maternally-inherited (cytoplasmic)
22 protein loci (fast), 2 rDNA genes (slow),
control region (very fast)
'Universal primers' permit PCR &
DNA sequencing from many taxa
cytochrome b
gene is widely used:
Large
data base for comparison
1140
bp in most vertebrates; we examine 401 bp in lab
3. Analyzing the data:
Phenetic
(how similar are taxa?)
versus cladistic
(how closely related are taxa?) criteria
These criteria agree, iff rates of evolution are constant
If evolutionary rates differ, closely related
organisms may appear different
Ex.: Crocodiles
are more closely related to birds, but more similar to lizards
Crocodiles resemble lizards more than birds
because birds rapidly evolved specializations for flight
A. Phenetic analysis
Simplest measure is % sequence
similarity (S)
p-distance = (1 - S) x 100
Patterns of similarity can be inferred
from UPGMA cluster analysis
[Unweighted
Pair Group Method, Arithmetic averaging],
a Sequential
Agglomerative Hierarchical Nesting (SAHN)
algorithm
[algorithm
= a set of instructions for doing a repetitive task]
In (n) x (n)
matrix, join the most similar pair
re-calculate (n-1) x (n-1) matrix, re-join,
and so on, until last pair is joined
Results are show as a phenogram:
a diagram of phenetic
relationships
UPGMA method assumes
that rates of evolution are equal
so branch
tips "come out even" (contemporaneous)
Some alternatives:
Neighbour-Joining
(NJ) analysis does not assume rate equality
branch
lengths are proportional to change: tips come out uneven
[algorithm joins nodes, rather than tips]
This
method is more realistic
Differential
weighting of nucleotide substitutions
accord
greater 'significance' to 'important' changes
Ex.:
Kimura 2-parameter distance (K2P)
model treats Ts & Tv separately
K transition
bias = [Ts]/[Tv]
There are twice as many kinds of transversions as transitions:
expected K = 0.5
But: recall results from Part 3 of Lab
#5:
Transversions (TV) are rare for close comparisons,
more common in distant relationships
K is variable according to the evolutionary problem under consideration:
K > 6 for close comparisons
B. Cladistic Analysis
Principles of homology
& analogy can be applied to nucleotide
changes
We rely only
on shared derived (synapomorphic) nucleotides,
& avoid
shared ancestral (symplesiomorphic) nucleotides,
and changes unique to single taxa (autapomorphies),
and convergent nucleotides between unrelated taxa.
Choice of preferred hypothesis is made on the
Principle of Parsimony
In general:
parsimony means that the simpler hypothesis is to be preferred
complex hypotheses are less probable
Evolutionary
parsimony:
a hypothesis that requires fewer character changes is preferred
Ex.: to explain the origin of a complex structure
it is more parsimonious to hypothesize that it has evolved only once
In molecular systematics, these changes are nucleotide substitutions
[DNA mutations]
The "Four-Taxon
Problem" and the "Three-Taxon Statement":
Among four taxa A,
B, C, & D, there are 3 hypotheses of relationship:
either
A is most closely related to B, or to C, or to D
We want to be able
to reach conclusions such as:
"X and Y are more closely related to each other than either
is to Z"]
A C A
B A B
|__| |__|
|__|
3 networks
| | | | |
| [cladograms]
B D C
D D C
If (for example), A is most closely related to B
A & B will share characters inherited from their common
ancestor
A aat tcg ctt cta gga atc tgc cta atc ctg
B ... ..a ..g ..a .t. ... ...
t.. ... ..a
C ... ..a ..c ..c ... ..t ...
... ... t.a
D ... ..a ..a ..g ..g ..t ...
t.t ..t t..
1 2 3 4
5
6
7
Seven
classes of nucleotide sites can be identified
(for details, see Notes
on Parsimony Analysis)
Types 1 - 4 are uninformative:
They give
no information about relationships, because
all hypotheses require the same number of changes,
so none is more parsimonious than the
others.
Type 1
is invariant.
No changes are required.
Type 2
indicates only that one taxon is unique
wrt the others:
all hypotheses require a single nucleotide change.
Type 3
indicates that all taxa are distinct & unique:
all hypotheses require three nucleotide changes.
Type 4
indicates that two taxa are similar,
but not whether this is ancestral or derived:
all hypotheses require two nucleotide changes.
[a '+'
indicates a change along a particular network branch]
a c a c
a a a a
A C
A C A
B A B
|___+ or
|_+_+ |___|
|___|
| +
| | +
+ + +
B D
B D C
D D C
a g
a g c
g c g
Types 5, 6 & 7 are informative:
They give information
about relationships, because
one hypothesis requires fewer changes than the others
& is therefore more parsimonious than the others
Type
5 indicates that A & B are most
closely related:
The
first hypothesis can explain the distribution of nucleotides with a single
change,
the latter two require
two changes each.
The first hypothesis is a
more parsimonious explanation
of the data than
the others.
a g
a a a
a
A C
A B A
B
|_+_|
+___+ +___+
| |
| | |
|
B D
C D D
C
a g
g g g
g
By the same logic:
Type
6 indicates that A & C are most
closely related.
Type
7 indicates that A & D are most
closely related.
A cladistic analysis counts the number of
informative characters favouring each hypothesis
The hypothesis with the "highest
score" requires the fewest changes
and is therefore the 'most
parsimonious' explanation.
This is also called the
'minimum length' solution.
Cladistic analyses may also be weighted:
Ex.: Count Tv:Ts
as 3:1 => Tv are 3x as meaningful
or, count Tv only (Transversion parsimony)
for "deep" analyses
or, count 1st & 2nd position substitutions >> 3rd
C. Placing the
root & Inferring the direction of evolutionary change
Suppose the first hypothesis (A & B
are most closely related) is most parsimonious
Ex.: In Lab
#5, we found that the majority of sites were of type #5.
We said:
"Ailuropoda
& Ursus are more closely related to each other than
either is to Procyon (or Martes)."
The hypothesis can be drawn as an
unrooted network
But: this evidence can also be used to argue
"Procyon & Martes are more closely related to each
other
than either is to Ursus (or Ailuropoda)."
To resolve this, we need to know where their common ancestor fits in.
There are four branches and one internode in
this network
An evolutionary tree is a network with a root:
The root indicates the relationship
of the common ancestor
A 'root'
can be placed on any of the branches or the internode.
So, there
are five possible rooted trees for this network.
All are equally
parsimonious:
not
all place A & B as each other's closest relatives.
Some
of these make the shared character a symplesiomorphy.
There are several ways to determine the correct placement
of the root
(1) Outgroup rooting:
Include a taxon that is
known to be less closely related
to any
of the ingroup taxa
than they are to each other.
Such
a taxon is called an outgroup or sister
taxon.
Ex.: Lynx
(Feliformia) is an outgroup to the Caniformia
(Note that this tree is equivalent to the NJ phenogram)
(2) Midpoint rooting:
Place the root halfway
between the two most different taxa.
This
assumes that molecular evolution is clock-like.
(Here, this
places the root on the internode.
(3) Character Polarity:
If the character state
of the ancestor is known (or can be inferred).
Root the tree accordingly
Use of polarity is
usually not possible with molecular data
Any nucleotide can mutate to any other, in either direction
any a c g t looks exactly like any other a
c g t
[Some models allow for differential probabilities of mutation]
Use of polarity with
morphological data is standard
Ex.: In an
analysis of the evolution of the number of heart chambers in
codfish (2), lizard (3), crocodile (4), & bird
(4)
we know that the evolutionary order is 2
3 4
(this is called a transformation series)
=> The root will be placed on the codfish
branch,
because we know the codfish most resembles the ancestor.
Crocs & Birds have a recent common ancestor with a four-chambered heart.
D. What does this analysis explain about the evolution & biology
of Pandas?
1.
Ailuropoda and Ursus are each others' closest relatives:
The
Giant Panda is a highly derived bear, not a raccoon.
Ailuropoda
should be classified in Ursidae.
2. Similarities of Ailuropoda
and Ailurus are convergent
(analogous):
these represent parallel feeding specializations.
"Hypertrophied
masticatory apparatus" permits feeding on bamboo:
(expanded
zygomatic arch, high mandibular ramus, and molariform teeth)
Jaw articulation above
toothrow gives mechanical advantage:
(similar
modifications occur in Hyaena for crushing bones).
3. Some similarities between Ailuropoda
and other ursids are ancestral homologies:
Bears (including pandas)
have short gestation and tiny neonates.
In most
bears, gestation & birth occur during winter hibernation:
=> early
birth gives access to milk, when no other food is available
Pandas do not hibernate:
young are carried during foraging:
Why
have altricial (underdeveloped) young when food is available?
"Small young could be explained if the suite
of physiological and behavioural
adaptations associated with the production of small neonates were
established
before splitting of the panda and ursid lines." (Ramsay
& Dunbrack, 1987)
4. Panda evolution seems to be quite recent:
Ailuropoda &
Ursus are about as similar genetically as dog & fox.
Fossils are known only
from Pleistocene (< 2 MYBP).
Development & growth
of cranial vs. axial skeleton in pandas
resembles
that of Hyaenas and boxer dogs: all have heavy crania.
These
species have less-developed post-cranial (axial) skeletons.
Selection
may operate on similar, hypothetical 'growth fields'
"The basic adaptive transition from Ursus
to Ailuropoda required the changing
of very few genetic messages [during an] origin by way of a very
small
population occupying a local bamboo forest." (Stanley
1979)
=> Pandas may be a textbook
case of quantum speciation:
the origin of a
new adaptive type in one or a few speciation events.
Text material
© 2000 by Steven M. Carr