Inferring the degree
of evolutionary
relationship
   How can we describe the position
of each 'twig' with respect to all others?
       distance:
amount
of
change between twigs
         
How similar (or different) are species?
          
phenetic
distance: distance measured between tips
                   
(i.e., "as the crow flies" from one twig to another)
           patristic
distance: distance measured along connecting branches
                   
(i.e., "as the ant runs" from one twig to another)
      relationship:
pattern of connection between twigs
           
How closely related are species?
         
 
cladistic
relationship:
pattern of branching back to most recent common ancestor (MRCA)
                   
(i.e., where do twigs join lower in tree?)
   Phenetic
(how
similar are taxa?)
      versus
cladistic
(how
closely related are taxa?) criteria
   These criteria agree, iff
rates
of evolution are constant
      If evolutionary
rates differ, closely related
organisms may appear different
       Ex.:
Crocodiles
are
more closely related to birds, but more similar to
lizards
                
Crocodiles
resemble
lizards more than birds
                
because
birds
rapidly evolved specializations for flight
     
Patterns
of similarity can be inferred from UPGMA
cluster analysis
        
[Unweighted Pair Group Method, Arithmetic
averaging],
           
a Sequential Agglomerative Hierarchical Nesting (SAHN)
algorithm
           
[algorithm = a set of instructions
for doing a repetitive task]
                
In (n) x (n) matrix, join the most similar pair
:
            
    
re-calculate
(n-1) x (n-1) matrix, re-join,
            
       
and
so on, until last pair is joined
      Results
are show as a phenogram:
        
a diagram of phenetic similarity
      Similarity
approximates an evolutionary relationships
under certain assumptions
           cladogram: a
diagram of evolutionary relationships (a tree)
       UPGMA
method assumes that rates of evolution are equal
           
so branch tips "come out even"
(contemporaneous)
            
DNA sequences evolve as a molecular clock
       Homework: Five
practice
problems in UPGMA phenogram
calculations
     Some
alternatives:
       Neighbour-Joining
(NJ)
analysis does not assume rate equality
                
large
evolutionary
rate differences lead to incorrect
trees
            
NJ allows branch lengths
proportional to change: tips
come out uneven
              
[algorithm
joins
nodes, rather than tips]
           
This method is more realistic, computationally harder
                   
[see
www.megasoftware.net
for
free software]
       Differential
weighting of nucleotide substitutions
           
accord greater 'significance' to 'important' changes
        
Ex.:
Kimura 2-parameter distance
(K2P) model treats Ts &
Tv
separately
               
K 
 transition
bias 
= [Transitions]
/ [Transversions] = [Ts] / [Tv]
                       
There
are
twice as many
kinds
of transversions as transitions:
                           
expected K = 0.5
               
But: Tv are rare for close comparisons,
                                    
more
common
for distant relationships
               
K is variable according to the evolutionary problem under
consideration:
                    
K > 6 for close comparisons
     Choice of
preferred
hypothesis is made on the Principle of Parsimony
         
In general: parsimony means that the simpler hypothesis
is
to be preferred 
           
Ex.: to explain the occurence of a complex structure
in multiple species
                           
it
is
more parsimonious to hypothesize that it has evolved only once
           
           
    Therefore, the species have a common evolutionary
origin
           Evolutionary
parsimony:
               
a
hypothesis
that requires fewer character changes is preferred
               
In
molecular
systematics, these changes are nucleotide substitutions
[SNPs]
      The "Four-Taxon
Problem" and the "Three-Taxon
Statement":
        
Among four taxa A, B, C, & D, there are three
hypotheses
of relationship:
           
either A is most closely related to B, or to C,
or
to
D
      We
want to be able to evaluate hypotheses of the form:
       "X
and Y are more closely related to each other
than either is to Z"
           
The
alternative
hypotheses can be shown as networks with branches
and an internode
Count number of
changes at 
informative
characters favouring each hypothesis
      The hypothesis
with the "highest score"
requires
the fewest
changes
        
and is therefore the 'most parsimonious' explanation.
        
This is also called the 'minimum length' or 'minimum spanning' solution.
[   Cladistic analyses
may
also be weighted: objective criteria exist for DNA data
           
Ex.: Count Tv:Ts as 3:1 => Tv are 3x as meaningful
             
or, count Tv only (Transversion parsimony)
for
"deep"
analyses
             
or, count 1st & 2nd position substitutions >> 3rd
]
Homework:
Practice four-taxon problems
Statistical tests
determine
confidence in branching order
         
Bootstrap Analysis: a re-sampling
technique
               
statistical
tests
usually involve replication / repetition of
experiment
               
this
is
inconvenient with DNA data: $$$
           
Suppose existing data set (400bp) is a accurate sample of parametric data
set (complete genome)
                 
re-sample existing n sites 1000 times, repeat
phylogenetic
analysis:
                       
how
often
do same clades / clusters appear?
                   
"50% bootstrap support"
indicates
group
occurs
more frequently than all others combined
                     
95% criterion is desirable, not often obtained with small
data sets
Placing the
root & Inferring the direction of evolutionary change
          With four taxa,
there
are four branches and
one internode
          The most closely
related "sister
taxon" may
occur on any of these
              
Where does this "common ancestor" fit in the tree?
  An evolutionary tree is
a network with a root:
      The root
indicates the relationship with the common ancestor
         
A 'root' can be placed on any of the branches or the internode.
         
So, there are five possible rooted
trees for this unrooted network.
             
All
are
equally parsimonious:
             
not
all
place A & B as each other's closest
relatives.
             
Some
of
these make shared charactes symplesiomorphic
      (1) Outgroup
rooting:
        
Include a taxon that is known to be less closely related
           
to any of the ingroup taxa than
they
are to each other.
           
Such a taxon is called an outgroup or
sister
taxon.
            
    Ex.:
Lynx
(Feloidea)
is an outgroup to the Canoidea
             
   
   
(Note that this tree is
equivalent to the NJ phenogram)
           Problematic in groups
where relationships are uncertain (Ex.
Wolffish (Anarhichis)
      (2) Midpoint
rooting:
        
Place the root halfway between
the
two
most different taxa.
           
This assumes that molecular evolution is clock-like.