Inferring the
nature of evolutionary
relationship
In a "tree," describe position of each 'twig
tip' with respect to any / all others?
Distance: amount of evolutionary
change between twigs
Or: How similar
(close) are they?
phenetic: distance measured between tips
"As the crow flies" from
one twig to another
patristic:
distance measured along connecting branches
"As the ant runs" from one twig to another
Relationship: pattern of connection
between twigs
How closely
related are species?
cladistic relationship: pattern of branching
back to
Most
Recent Common Ancestor (MRCA)
How do twigs join lower in tree?
Phenetic (how similar
are taxa?)
versus cladistic (how closely
related are taxa?) criteria
Criteria agree, iff
rates of evolution are constant
If evolutionary rates differ, closely related organisms
may appear dissimilar
Ex.: Crocodiles
more closely related to birds, but more
similar (?) to lizards
Crocodilia resemble Squamata more than Aves (hence, Class Reptilia)
because
avian
ancestor(s) rapidly evolved specializations for flight
Theoretical & technical breakthroughs late
1960s ~ 1980s:
Theory of
Phylogenetic Systematics formalized
Molecular data (allozymes
& DNA) replace morphology as primary data
for phylogenetic inference
Computational power increases
DNA sequencing capacity increases
***Patterns of evolutionary relationship
to be understood from molecular
data;
Patterns of organismal evolution to be understood from relationships
***
Patterns of similarity inferred from UPGMA cluster analysis
[Unweighted
Pair Group Method, Arithmetic
averaging],
Sequential Agglomerative Hierarchical Nesting (SAHN) algorithm
algorithm: set of instructions for
repetitive task
In (n) x (n) matrix, join most similar
pair:
re-calculate (n-1) x (n-1)
matrix, re-join,
& so on, until last pair joined
Clustering
results shown as phenogram:
diagram of phenetic similarity
Similarity estimates relationships under
certain assumptions
UPGMA method assumes rates of evolution equal
so branch tips "come
out even" (contemporaneous)
DNA sequences evolve as stochastic
"Molecular Clock"
HOMEWORK: Practice problems
for UPGMA phenogram
calculations
Alternative phenetic methods
Neighbor-Joining (NJ) analysis does not assume
rate equality
large
rate
differences lead to incorrect
trees
NJ allows branch lengths
proportional to change: tips
come
out uneven
algorithm joins nodes, rather
than tips
More realistic, computationally harder
Differential weighting of
nucleotide substitutions
accord greater 'significance' to 'important' changes
Ex.: Kimura 2-parameter (K2P) model
treats Transitions (Ts) & Transversions (Tv)
differently
K Transition
Bias = [Ts] / [Tv]
Twice as many kinds
of Tv as Ts: expect K = 0.5
But: Tv rare for close comparisons,
more common for distant relationships
K variable according to evolutionary problem under
consideration:
K > 10 for close comparisons, K ~ 3 for moderate
comparison
Tv-only for distant
comparisons
Choice of
preferred hypothesis made on Principle of Maximum
Parsimony
Parsimony: simpler
hypothesis preferred
Ex.: If complex trait occurs in multiple species,
more parsimonious to hypothesize it evolved only once
=> Trait evolved
in single common ancestor
Ex.: Evolution
of ice-breeding in Phocidae
("True" seals),
from ecological & molecular parsimony perspectives
Evolutionary parsimony:
Hypothesis that requires fewer character changes preferred
In molecular systematics,
count SNP
changes
"Four-Taxon Problem" & "Three-Taxon
Statement":
Four taxa A,
B, C, & D have three hypotheses of
relationship:
A most closely related to B, or C, or D
Evaluate
alternative hypotheses as:
"X and Y are more closely related to each
other than either is to Z"
Alternative hypotheses shown as networks with
branches & internode
Count changes
at informative SNPs that
favor each hypothesis
Hypothesis with fewest changes
is Maximum Parsimony
explanation:
AKA 'Minimum
Length' or 'Minimum
Spanning' solution
Cladistic analyses weighted:
objective criteria exist for DNA
data
Ex.: Count Tv:Ts as 3:1 => Tv are 3x
as 'informative'
or, count Tv only (Transversion
parsimony) for "deep" analyses
or, count 1st & 2nd position substitutions
>> 3rd : replacement substitutions
HOMEWORK: What
triplets are
exceptions & why?
Evolutionary
trees are networks
with roots
With four taxa, network
has four branches & one internode
Root indicates
relationship with common ancestor
'root'
can be placed on any
branch or internode
Thus five
possible rooted trees
(cladogram) for four-taxon
network
All
equally
parsimonious:
not all place A & B as each
others closest relatives
Some
make
shared SNPs symplesiomorphic
Outgroup
rooting
Include taxon known
to be less closely related
to any ingroup taxon than
they are to each other
Call this an outgroup
Ex.: Use feliform as
outgroup to caniform problem
Note cladistic tree has same topology as NJ phenogram
Ex.
Wolffish (Anarhichas): Johnstone
et al. (2007)
HOMEWORK: Practice
four-taxon cladistic
problems
Maximum Likelihood
analysis
Different approach
to evolutionary trees based on Bayes Theorem
Optimization
criterion: How to choose 'correct '
solution
Phenetic methods look for shortest tree
Cladistic methods
look for minimum number of events
Likelihood methods
look for most probable tree ("least unlikely"
= "maximally likely"),
given a priori model of
evolutionary events
E.g., given estimates of all possible SNP rates among A,
C, G, & T (n = 12)
Calculate probability
of simultaneous occurrence
of all events
necessary to produce any particular tree
Any
particular tree is (extremely) unlikely,
but some
tree is least unlikely ( = maximally likely)
Ratio
of likelihoods expresses how much better wrt
any other
Heuristic example: Consider game of five-card stud poker
with standard 52-card deck
Consider game of Fizzbin
with unknown deck
Statistical tests determine
confidence in branching order
Bootstrap Analysis: a re-sampling
technique
statistical
tests
usually
involve replication / repetition of experiment:
this
is
(?) inconvenient with DNA data
Suppose sample data set of n bases
accurately estimates parametric
data (complete genome)
re-sample n sites (with replacement) ~3,000
times
repeat
phylogenetic analysis on each 'new' set:
among all
of these sets,
how often do same clades / clusters appear?
"50% bootstrap support"
identifies groups that occur more frequently than all others
combined
95% criterion desirable, sometimes not obtained with
smaller data sets
cf. 1,140bp vs 11,582bp data sets
Download & install free MEGA X [Molecular Evolutionary
Genetic Analysis]
software
Lab Exercise:
Are Giant Pandas
(Ailuropoda) and Red
(Lesser) Pandas (Ailurus) each others closest
relatives?
1,140 bp Cytochrome b data set (.meg
format)
15,582 bp mtDNA Coding Region data
set (.meg
format)
(ZIP
file)
15,600 bp mtDNA Coding Region, 12 families (.meg
format) (annotation)
HOMEWORK: Results for
the Panda Problem
from UPGMA,
Neighbor Joining, Maximum Parsimony, & Maximum Likelihood
methods
Evolutionary genetic
analysis of Newfoundland Caribou (Rangifer
tarandus terranovae) (Wilkerson
et al. 2018)
Phylogenetic analysis of codfish & relatives (Gadidae)
(Coulson
et al. 2006)
A molecular understanding of the evolutionary history of birds
(Jarvis et al. 2014)
Applications to the evolution of COVID-19
SARS virus