Consider five taxa (A, B, C, D, E)
with the following distance matrix. Note that distances involving taxon C are
unusually large:
A | B | C | D | E | |
A | 0 | - | - | - | - |
B | 20 | 0 | - | - | - |
C | 80 | 80 | 0 | - | - |
D | 60 | 60 | 100 | 0 | - |
E | 80 | 80 | 120 | 80 | 0 |
As before, A & B are
closest (20 units): join them
into one cluster (AB) joining at 20, and
recalculate other average distances as before.
(AB) | C | D | E | |
(AB) | 0 | - | - | - |
C | 80 | 0 | - | - |
D | 60 | 100 | 0 | - |
E | 80 | 120 | 80 | 0 |
(AB) & D are
closest (60 units): join them
into one cluster (ABD) joining at 60, and
recalculate the average distances as before.
(ABD) | C | E | |
(ABD) | 0 | - | - |
C | 90 | 0 | - |
E | 80 | 120 | 0 |
E & (ABD) are
closest (80 units): join them
into one cluster (ABDE) joining at 80, and
recalculate the average distance. This gives:
(ABDE) | C | |
(ABDE) | 0 | - |
C | 105 | 0 |
C joins the remaining taxa at 105. This completes the analysis.
The analysis suggests that C is the least similar, and by implication the most distantly related, taxon to the other four (below, left), if similarity of ABDE estimates their relationship to C. In fact, the evolutionary tree from which the data were derived (below, right) shows that C is most closely related to (AB) [they have the most recent common ancestor], but has evolved at twice the rate of other taxa. The violation of the rate equality assumption of the method is guaranteed to give a wrong answer (below, left).