Coalescence Theory: of
Marbles, SNPs & Taxis
(I)
Place 2N marbles on
a circular table, and let them roll about at random. It
is a certainty that one marble will be the first to
roll off, then another, and another, and so on until only
one marble remains on the table. We want to know (or
predict) something about the statistics of
marbles, and proceed as follows.
The chance that any particular marble will be the
first to roll off the table is 1/2N;
the chance that any particular marble will be the
last to roll off is also 1/2N. If
the rate at which marbles roll off the table is r,
then the expected interval between events is 1/r.
Note that r is stochastically constant.
(II) Now consider a population of 2N individuals,
where 2N allows the population to be
treated as diploid individuals. Nielsen &
Slatkin (2013) show
Eqn 3.1: The probability that any particular
individual will coalesce
[disappear into the population past] in the
present generation is Pr(C) =
1/(2N)
Then the
probability that any particular individual will
not coalesce is Pr(C') =
(1 - 1/2N))
Eqn 3.2: The probability that any particular
individual will not coalesce in r
successive generations is Pr(C'r) = (1 - 1/(2N))r
Eqn 3.3: The
probability that any particular
individual will not coalesce
in (r - 1) generations, and then
coalesce in the rth generation,
is
Pr(Cr)
= [(1 - 1/(2N))r-1][1/(2N)]
Eqn 3.4: Define an
interval t = (r)(1/(2N)) as the expected time
to coalescence. Then r = (2N)(t).
From Eqn 3.2, the expected
interval between coalescence events is
then
Pr(C'r)
= (1 - 1/(2N))2Nt
"The Calculus" shows that as N
approaches infinity, Pr(C'r)
= e-t,
where e is the base of natural
logarithms.
That is, the interval between
coalescence events follows an Exponential Function.
(III) This function is also called the Taxi Cab Function.
Suppose taxis arrive at a taxi stand at a
stochastically constant rate of one every six
minutes (r = 1/6). Having just arrived, we
ask the other person waiting, How long since the
last cab? She says, 10 minutes. Intuitively, we
might think that this means that a cab is more
likely to arrive sooner rather than later.
However, the probability of a cab arriving in the
next minute remains 1/6, no matter
what has happened before. The next cab
arrives after a further 2 minutes. Another person
arrives, and asks you the same question. You
answer, 2 minutes, and the newcomer says, Well I
guess it will be awhile. A cab arrives after 3
minutes, you get in and commence to cypher. The
previous two cabs arrived at an average interval
(10+2)/2 = 6, and all three at (10+2+3)/3 = 5.
(IV) Simulate this with a six-sided die
[one of a pair of dice]. A roll of six is the
arrival of a cab. Keeping track of the count, roll
the die until you get a six. Repeat for 50 ~ 100
or more sixes: determine the average interval
between sixes, and plot the distribution of
intervals. As the sample size n increases,
the distribution will become exponential for (1/6)n.
For the advanced
student: A greater sample set can
be obtained in Excel
with the RANDOMBETWEEN function as A1
= RANDOMBETWEEN(1,6) which returns random
integers between 1 and 6. If repeated for lines A1
- A12000, the expectation is n = 2000
'6s' and the intervals between '6s'
approaches a more accurate
distribution.
Figure & Text material © 2022 by Steven M. Carr