Inferring the degree
of evolutionary
relationship
How can we describe the position
of each 'twig' with respect to all others?
distance:
amount
of
change between twigs
How similar (or different) are species?
phenetic
distance: distance measured between tips
(i.e., "as the crow flies" from one twig to another)
patristic
distance: distance measured along connecting branches
(i.e., "as the ant runs" from one twig to another)
relationship:
pattern of connection between twigs
How closely related are species?
cladistic
relationship:
pattern of branching back to most recent common ancestor (MRCA)
(i.e., where do twigs join lower in tree?)
Phenetic
(how
similar are taxa?)
versus
cladistic
(how
closely related are taxa?) criteria
These criteria agree, iff
rates
of evolution are constant
If evolutionary
rates differ, closely related
organisms may appear different
Ex.:
Crocodiles
are
more closely related to birds, but more similar to
lizards
Crocodiles
resemble
lizards more than birds
because
birds
rapidly evolved specializations for flight
Patterns
of similarity can be inferred from UPGMA
cluster analysis
[Unweighted Pair Group Method, Arithmetic
averaging],
a Sequential Agglomerative Hierarchical Nesting (SAHN)
algorithm
[algorithm = a set of instructions
for doing a repetitive task]
In (n) x (n) matrix, join the most similar pair
:
re-calculate
(n-1) x (n-1) matrix, re-join,
and
so on, until last pair is joined
Results
are show as a phenogram:
a diagram of phenetic similarity
Similarity
approximates an evolutionary relationships
under certain assumptions
cladogram: a
diagram of evolutionary relationships (a tree)
UPGMA
method assumes that rates of evolution are equal
so branch tips "come out even"
(contemporaneous)
DNA sequences evolve as a molecular clock
Homework: Five
practice
problems in UPGMA phenogram
calculations
Some
alternatives:
Neighbour-Joining
(NJ)
analysis does not assume rate equality
large
evolutionary
rate differences lead to incorrect
trees
NJ allows branch lengths
proportional to change: tips
come out uneven
[algorithm
joins
nodes, rather than tips]
This method is more realistic, computationally harder
[see
www.megasoftware.net
for
free software]
Differential
weighting of nucleotide substitutions
accord greater 'significance' to 'important' changes
Ex.:
Kimura 2-parameter distance
(K2P) model treats Ts &
Tv
separately
K transition
bias
= [Transitions]
/ [Transversions] = [Ts] / [Tv]
There
are
twice as many
kinds
of transversions as transitions:
expected K = 0.5
But: Tv are rare for close comparisons,
more
common
for distant relationships
K is variable according to the evolutionary problem under
consideration:
K > 6 for close comparisons
Choice of
preferred
hypothesis is made on the Principle of Parsimony
In general: parsimony means that the simpler hypothesis
is
to be preferred
Ex.: to explain the occurence of a complex structure
in multiple species
it
is
more parsimonious to hypothesize that it has evolved only once
Therefore, the species have a common evolutionary
origin
Evolutionary
parsimony:
a
hypothesis
that requires fewer character changes is preferred
In
molecular
systematics, these changes are nucleotide substitutions
[SNPs]
The "Four-Taxon
Problem" and the "Three-Taxon
Statement":
Among four taxa A, B, C, & D, there are three
hypotheses
of relationship:
either A is most closely related to B, or to C,
or
to
D
We
want to be able to evaluate hypotheses of the form:
"X
and Y are more closely related to each other
than either is to Z"
The
alternative
hypotheses can be shown as networks with branches
and an internode
Count number of
changes at
informative
characters favouring each hypothesis
The hypothesis
with the "highest score"
requires
the fewest
changes
and is therefore the 'most parsimonious' explanation.
This is also called the 'minimum length' or 'minimum spanning' solution.
[ Cladistic analyses
may
also be weighted: objective criteria exist for DNA data
Ex.: Count Tv:Ts as 3:1 => Tv are 3x as meaningful
or, count Tv only (Transversion parsimony)
for
"deep"
analyses
or, count 1st & 2nd position substitutions >> 3rd
]
Homework:
Practice four-taxon problems
Statistical tests
determine
confidence in branching order
Bootstrap Analysis: a re-sampling
technique
statistical
tests
usually involve replication / repetition of
experiment
this
is
inconvenient with DNA data: $$$
Suppose existing data set (400bp) is a accurate sample of parametric data
set (complete genome)
re-sample existing n sites 1000 times, repeat
phylogenetic
analysis:
how
often
do same clades / clusters appear?
"50% bootstrap support"
indicates
group
occurs
more frequently than all others combined
95% criterion is desirable, not often obtained with small
data sets
Placing the
root & Inferring the direction of evolutionary change
With four taxa,
there
are four branches and
one internode
The most closely
related "sister
taxon" may
occur on any of these
Where does this "common ancestor" fit in the tree?
An evolutionary tree is
a network with a root:
The root
indicates the relationship with the common ancestor
A 'root' can be placed on any of the branches or the internode.
So, there are five possible rooted
trees for this unrooted network.
All
are
equally parsimonious:
not
all
place A & B as each other's closest
relatives.
Some
of
these make shared charactes symplesiomorphic
(1) Outgroup
rooting:
Include a taxon that is known to be less closely related
to any of the ingroup taxa than
they
are to each other.
Such a taxon is called an outgroup or
sister
taxon.
Ex.:
Lynx
(Feloidea)
is an outgroup to the Canoidea
(Note that this tree is
equivalent to the NJ phenogram)
Problematic in groups
where relationships are uncertain (Ex.
Wolffish (Anarhichis)
(2) Midpoint
rooting:
Place the root halfway between
the
two
most different taxa.
This assumes that molecular evolution is clock-like.