Coalescence Theory: of
                  Marbles, SNPs & Taxis 
              
                
      
        (I)
                Place 2N marbles on
                a circular table, and let them roll about at random. It
                is a certainty that one marble will be the first to
                roll off, then another, and another, and so on until only
                  one marble remains on the table. We want to know (or
                      predict) something about the statistics of
                      marbles, and proceed as follows.
                      
                      The chance that any particular marble will be the
                      first to roll off the table is 1/2N;
                      the chance that any particular marble will be the
                      last to roll off is also 1/2N. If
                      the rate at which marbles roll off the table is r,
                      then the expected interval between events is 1/r.
                      Note that r is stochastically constant.
                      
                      (II) Now consider a population of 2N individuals,
                      where 2N allows the population to be
                      treated as diploid individuals. Nielsen &
                      Slatkin (2013) show
                      
                      Eqn 3.1: The probability that any particular
                      individual will coalesce
                      [disappear into the population past] in the
                      present generation is Pr(C) =
                          1/(2N)
                             
                              Then the
                      probability that any particular individual will
                        not coalesce is Pr(C') =
                          (1 - 1/2N))
                      
                      Eqn 3.2: The probability that any particular
                      individual will not coalesce in r
                      successive generations is Pr(C'r) = (1 - 1/(2N))r
                      
          Eqn 3.3: The
                                  probability that any particular
                                  individual will not coalesce
                                  in (r - 1) generations, and then
                                  coalesce in the rth generation,
                                  is
                                         
                                         
                                      Pr(Cr)
                                      = [(1 - 1/(2N))r-1][1/(2N)]
                        
                      Eqn 3.4: Define an
                      interval t = (r)(1/(2N)) as the expected time
                      to coalescence. Then r = (2N)(t). 
                             
                             
                          From Eqn 3.2, the expected
                      interval between coalescence events is
                      then
                    
             
                                         
                                            Pr(C'r)
                                      = (1 - 1/(2N))2Nt
                        
                             
                          "The Calculus" shows that as N
                      approaches infinity, Pr(C'r)
                          = e-t,
                      where e is the base of natural
                      logarithms. 
                      
                             
                          That is, the interval between
                      coalescence events follows an Exponential Function.
                      
                      (III) This function is also called the Taxi Cab Function.
                      Suppose taxis arrive at a taxi stand at a
                      stochastically constant rate of one every six
                      minutes (r = 1/6). Having just arrived, we
                      ask the other person waiting, How long since the
                      last cab? She says, 10 minutes. Intuitively, we
                      might think that this means that a cab is more
                      likely to arrive sooner rather than later.
                      However, the probability of a cab arriving in the
                      next minute remains 1/6, no matter
                        what has happened before. The next cab
                      arrives after a further 2 minutes. Another person
                      arrives, and asks you the same question. You
                      answer, 2 minutes, and the newcomer says, Well I
                      guess it will be awhile. A cab arrives after 3
                      minutes, you get in and commence to cypher. The
                      previous two cabs arrived at an average interval
                      (10+2)/2 = 6, and all three at (10+2+3)/3 = 5.
                      
                      (IV) Simulate this with a six-sided die
                      [one of a pair of dice]. A roll of six is the
                      arrival of a cab. Keeping track of the count, roll
                      the die until you get a six. Repeat for 50 ~ 100
                      or more sixes: determine the average interval
                      between sixes, and plot the distribution of
                      intervals. As the sample size n increases,
                      the distribution will become exponential for (1/6)n.
                      
                      For the advanced
                          student: A greater sample set can
                      be obtained in Excel
                          with the RANDOMBETWEEN function as A1
                        = RANDOMBETWEEN(1,6) which returns random
                      integers between 1 and 6. If repeated for lines A1
                        - A12000, the expectation is n = 2000
                      '6s' and the intervals between '6s'
                      approaches a more accurate
                          distribution. 
                        
         
            
      
Figure & Text material © 2022 by Steven M. Carr