Inferring the
nature of evolutionary
relationship
A complete evolutionary "tree"
describes position of any 'twig', with respect to all
others
Optimization
criterion: How to choose 'correct '
solution
Distance: amount of evolutionary
change between twigs
Or: How similar
(alike) are they?
phenetic: distance measured between tips
"As the crow flies" from
one twig to another
patristic: distance measured along
connecting branches
"As the ant runs" from one twig to another
Relationship: pattern of connection
between twigs
How closely
related are species?
cladistic relationship: "As the branches
join" back to
Most
Recent Common Ancestor (MRCA)
How do twigs join lower 'stems', 'branches',
'limbs', etc. in tree?
Phenetic
& Cladistic criteria
agree, iff rates of evolution are constant
If evolutionary rates differ, closely related organisms
appear
dissimilar
Ex.: Crocodilia
more similar to Squamata (lizards & snakes) BUT more
closely related to birds
Historically: Reptilia include scaly,
four-legged crocs, lizards & snakes, turtles &
tortoises
Aves include feathery,
two-legged, two-winged creatures with evolved adaptation for
flight
Likelihood:
Given a model of molecular evolutionary change,
which tree is least unlikely (maximally likely)?
Math known, computationally unfeasible until recently.
Theoretical & technical breakthroughs late
1960s ~ 1990s ~ 21st cent.:
Theory of
Phylogenetic Systematics formalized
Molecular data (allozymes
& DNA) replace morphology as primary data
for phylogenetic inference
Computational power increases
DNA sequencing capacity increases
***Patterns of evolutionary
relationship to be understood from
molecular data;
then, Patterns
of organismal evolution to
be analyzed based on relationships ***
Patterns of similarity inferred from UPGMA cluster analysis
[Unweighted
Pair Group Method, Arithmetic
averaging],
Sequential Agglomerative Hierarchical Nesting (SAHN) algorithm
algorithm: set of instructions for
repetitive task
In (n) x (n) matrix, join most similar
pair:
re-calculate (n-1) x (n-1)
matrix, re-join,
& so on, until last pair joined
Clustering
results shown as phenogram:
diagram of phenetic similarity
Similarity estimates relationships under
certain assumptions
UPGMA method assumes rates of evolution equal
so branch tips "come
out even" (contemporaneous)
Rate differences lead to incorrect trees
HOMEWORK: Practice problems
for UPGMA phenogram
calculations
Alternative phenetic methods
Neighbor-Joining (NJ) analysis does not assume
rate equality
NJ allows branch lengths
proportional to change: tips
come
out uneven
algorithm joins nodes, rather
than tips
More realistic, recognizes stochastic "Molecular
Clock"
Differential weighting of
nucleotide substitutions
accord greater 'significance' to certain classes of change
Ex.: Kimura 2-parameter (K2P) model
treats Transitions (Ts) & Transversions (Tv)
differently
K Transition
Bias = [Ts] / [Tv]
Twice as many kinds
of Tv as Ts: expect K = 0.5
But: Tv rare for close comparisons,
more common for distant relationships
Set
K according to nature of evolutionary problem under
consideration:
K = 1 for close comparisons, K = 3 for moderate
comparison
K = 10 or Tv-only
for distant comparisons
Choice of
preferred hypothesis made on Principle of Maximum
Parsimony
Parsimony: simpler
hypothesis preferred
Ex.: If complex trait occurs in multiple species,
more parsimonious to hypothesize it evolved only once
=> Trait evolved
in single common ancestor
Ex.: Evolution
of ice-breeding in Phocidae
("True" seals),
from ecological & molecular parsimony perspectives
Evolutionary parsimony:
Hypothesis that requires fewest character changes
preferred
In molecular systematics,
which requires fewest SNP
changes
"Four-Taxon Problem" & "Three-Taxon
Statement":
Four taxa A,
B, C, & D have three hypotheses of
relationship:
A most closely related to B, or C, or D
Evaluate
alternative hypotheses as:
"X and Y are more closely related to each
other than either is to Z"
Alternative hypotheses shown as networks with
branches & internode
Count changes
at informative SNPs that
favor each hypothesis
Hypothesis that requires fewest changes
is Maximum Parsimony
explanation:
AKA 'Minimum
Length' or 'Minimum
Spanning' solution
Modifications: use K2P criteria, weight # changes by K
= [Tv] / [Ts]
Protein Parsimony: count amino acid substitutions
Count 1st & 2nd position SNPs only
HOMEWORK: What
triplets are
exceptions & why?
Evolutionary
trees are networks
with roots
With four taxa, network
has four branches & one internode
Root indicates
relationship with common ancestor
'root'
can be placed on any
branch or internode
Thus five
possible rooted trees
(cladogram) for four-taxon
network
All
equally
parsimonious:
not all place A & B as each
others closest relatives
Some
make
shared SNPs symplesiomorphic
Outgroup
rooting
Include taxon known
to be less closely related
to any ingroup taxon than
they are to each other
Call this an outgroup
Ex.: Use feliform as
outgroup to caniform problem
Note cladistic tree has same topology as NJ phenogram
Ex.
Wolffish (Anarhichas): Johnstone
et al. (2007)
HOMEWORK: Practice
four-taxon cladistic
problems
Maximum Likelihood
analysis
Different approach
to evolutionary trees based on Bayes
Theorem
Given estimates of all possible SNP rates among
A, C,
G, & T (n = 12)
Calculate probability
of simultaneous occurrence
of all events
necessary to produce any particular tree
Any
particular tree is (extremely) unlikely,
but one tree is
least unlikely ( = maximally likely)
Ratio
of likelihoods expresses how much better wrt
any other
Heuristic example: five-card stud poker with
standard 52-card deck
Statistical tests determine
confidence in branching order
Bootstrap Analysis: a re-sampling
technique
statistical
tests
usually
involve replication / repetition of experiment:
this
is
(?) inconvenient with DNA data
Suppose sample data set of n bases
accurately estimates parametric
data (complete genome)
re-sample n sites (with replacement) ~3,000
times
repeat
phylogenetic analysis on each 'new' set:
among all
of these sets,
how often do same clades / clusters appear?
"50% bootstrap support"
identifies groups that occur more frequently than all others
combined
Download & install MEGA [Molecular Evolutionary
Genetic Analysis]
software [Version 11 as of
November 2024]
GenBank
links to Carnivora
/ Primata
Lab Exercise: Are Giant
Pandas (Ailuropoda)
and Red (Lesser) Pandas (Ailurus) each others
closest relatives?
1,140 bp Cytochrome b data set (.meg
format)
15,582 bp mtDNA Coding Region data
set (.meg
format)
(ZIP
file)
15,600 bp mtDNA Coding Region, 12 taxonomic families (.meg
format) (annotation)
HOMEWORK: Results for
the Panda Problem
from UPGMA,
Neighbor Joining, Maximum Parsimony, & Maximum Likelihood
methods