Bayes Theorem

Introduction to Bayes Theorem

Conventional statistics rely on a Probabilistic Model of events, such as the probability (p) that one will draw an Ace from a deck of cards (p = 4/52) or roll Boxcars with two dice (p = 1/36). The joint probability of drawing an Ace AND rolling boxcars in then simply p' = (1/13)(1/36) = 0.00214. This can be extended to biological situations, for example that the next hospital patient you see will be male and (or) have hemophilia, based on data that about half the population is male, and that a certain fraction of the population has hemophilia. The probabilistic model is already complicated by the recognition that hemophilia is typically (but not always) a male trait, and further that in a hospital ward, there will be a higher proportion of hemophiliacs than in the outside population. Note that the probabilistic approach will be different when applied to optometric patients who are male and (or) color-blind. A probabilistic approach may fail under these circumstances.

Alternatively, the Bayes Model is concerned with the likelihood of events, which explicitly considers the co-occurrence of events, especially where those events are not independent. This is phrased as, What is the probability of event A, given that event B also occurs?

Bayes’ Theorem is stated mathematically as

p(A|B) = [ p(B|A) x p(A) ] / p(B)

where A & B are events, and p(B) ≠ 0. An event is something that can be true or false, for example, that a person is color blind, or male.

p(A|B) and p(B|A) are conditional probabilities, the likelihood of event A occurring, given that B is true, and v.v. Read p(A|B) as the probability of A given B. p(A) and p(B) are the marginal probabilities of observing A and B, independently of each other: for example, the proportion of color blind people, or males.

Among other uses, Bayes’ Theorem provides an improved method of assessing the likelihood of two non-independent events occurring simultaneously.

Example: Sensitivity & Specificity of Drug Testing

Suppose a urine test used to detect the presence of a particular banned drug is 99.9% sensitive and 99.0% specific. That is, the test will provide 99.9% true positive results for drug users, and 99% true negative results for non-users. Suppose further than 0.5% of the population tested are drug users (incidence). We ask: What is the probability that an individual who tests positive is a user? Bayes’ Theorem phrases this as, what is p(User|+), that is, what is the probability that an individual is a User, provided that s/he tests positive ?

Let p(A) = p(User) and p(B) = p(+), then

p(User|+) = [ p(+|User) x p(User) ] / p(+)

Here, p(+|User) estimates sensitivity, that 0.999 of Users tested will be detected, and [1 - p(+|Non-User)] incorporates specificity, that only (1 – 0.99) = 0.01 of Non-Users will be reported (incorrectly) as Users.

Then, p(+) estimates the total number of positive tests, including true as well as false positives. These two components are

p(+) = [ p(+|User) x p(User) ] + [ p(+|Non-User) x p(Non-User) ]

Keeping the same number formats as defined above

p(+) = (0.999)(0.005) + (1 - 0.99)(1 - 0.005) = 0.01495

So that

p(User|+) = [ p(+|User) x p(User) ] / p(+) = (0.999 x 0.005) / [(0.999)(0.005) + (1 - 0.99)(1 - 0.005)] = 0.3342

    That is, even if an individual tests positive, it is twice as likely as not (1 – 33.42% = 66.58%) that s/he is not a User. Why? Even though the test appears to be highly “accurate” (99.9% sensitivity & 99% specificity), the number of non-Users is very large compared to the number of Users. Under such conditions, the count of false positives exceeds the count of true positives. For example, if 1,000 individuals are tested, we expect 995 non-Users and 5 Users. Among the 995 non-Users, we expect 0.01 x 995 ≈ 10 false positives. Among the 5 Users, we expect 0.99 x 5 = 5 true positives. So, out of 15 positive tests, only 5 (33%) are genuine. The test cannot be used to screen the general population for Users.

    What are the effects of improving “accuracy” of the test? If sensitivity were increased to 100%, and specificity remained at 99%, p(User|+) = 33.44%, a minuscule improvement. Alternatively, if sensitivity remains at 99.9% and specificity is increased to 99.5%, then p(User|+) = 50.10%, and half the positive tests are reliable. The test remains unreliable.

    How can testing be improved? If sensitivity and specificity remain unchanged at 0.999 and 0.99 respectively, but in the population of interest the incidence of users increases to 0.1, p(User|+) = 0.91736, and the test is reasonably reliable (but not at a 95% criterion). Alternatively, testing may be applied as a population screen, and any individual who tests positive may be re-tested: the probability that any non-user will fail the test twice is (0.001)², but the probability that a user will escape detection twice is only (0.01)².

    HOMEWORK: Write an Excel spreadsheet program to calculate p(User|+) for various values of Sensitivity, Specificity, and Incidence. Use the base values above as a starting point. Under what circumstances is the test most “useful”? Explain.