Introduction to Bayes
Theorem
Conventional statistics rely on a Probabilistic
Model of events, such as the probability (p)
that one will draw an Ace from a deck of cards
(p = 4/52) or roll Boxcars with two dice (p
= 1/36). The joint probability of drawing an Ace AND
rolling boxcars in then simply p' = (1/13)(1/36)
= 0.00214. The probability of drawing an Ace OR rolling
boxcars is p'' = [(1/13) + (1/36)] / (13x36) = (36 + 13)=
0.1047, about fifty times a great, where AND and OR are the
operant words for multiplication and addition, respectively.
This can be extended to biological
situations, for example that the next hospital patient you
see will be male and (or) have hemophilia, based on data
that about half the population is male, and that a certain
fraction of the population has hemophilia. The probabilistic
model becomes complicated once it is recognized that
hemophilia is typically a male trait, and further that in a
hospital ward, there will be a higher proportion of
hemophiliacs than in the outside population. Note that the
probabilistic approach will be different when applied to
optometric patients who are male and (or) color-blind. A
simple probabilistic approach may fail under these
circumstances.
Alternatively, the Bayes Model is
concerned with the likelihood of events, which
explicitly considers the co-occurrence of events,
especially where those events are not independent.
This is phrased as, What is the
probability of event A, given that event B also occurs?
Bayes’ Theorem as stated
mathematically is:
p(A|B) = [ p(B|A) x p(A) ] / p(B)
where A & B are events,
and p(B) ≠ 0. An event is something
that can be true or false, or one way or another, for
example, that a person is color blind, or female. Bayes'
Theorem can then be stated in words as
The probability of Event A given
that B is True, is equal to
the probability of Event B given
that A is True, times the
probability of Event A, all divided
by the probability of Event B.
p(A|B) and p(B|A) are conditional probabilities,
the likelihood of
event A
occurring, given that B is true, and v.v. The shorthand
is p(A|B) as the probability of A given B.
p(A) and p(B) are the marginal probabilities
of observing A
and B,
independently of each other: for example, the proportion of
color-blind people, or females.
Among other
uses, Bayes’ Theorem provides an improved method of
assessing the likelihood that two non-independent
events will occur simultaneously.
Example: Sensitivity &
Specificity of Drug Testing
Suppose a urine test used to detect the presence of a
particular banned performance-enhancing drug is 99.9% sensitive and 99.0% specific. That
is, the test will provide 99.9%
true positive
results for drug users, and 99.0%
true negative
results for non-users. Suppose further than 0.5%
of the athlete population tested are drug users (incidence). We
ask: What is the
probability that an individual who tests positive is a
User? Bayes’ Theorem phrases this as, what is p(User|+), that is, what
is the probability that an individual is a User,
provided that s/he tests positive ?
Let p(A) = p(User)
and p(B) = p(+), then
p(User|+) = [ p(+|User) x p(User) ] / p(+)
Here, p(+|User) estimates sensitivity, that 0.999 of Users tested
will be detected, and [1 - p(+|Non-User)]
incorporates specificity, that only
(1 – 0.99) = 0.01 of
Non-Users
will be reported (incorrectly) as Users.
Then, p(+) estimates the total number of positive tests,
including true as well as false positives. These two
components are
p(+)
= [ p(+|User) x p(User) ]
+ [ p(+|Non-User) x p(Non-User)
]
Keeping the
same number formats as defined above
So that
p(User|+) = [ p(+|User) x p(User) ] / p(+) = (0.999 x 0.005) / [(0.999)(0.005) + (1 - 0.99)(1 - 0.005)] = 0.3342
That is, even if an individual tests positive, it is
twice as likely as not (1 – 33.42% = 66.58%) that s/he
is not a User.
Why? Even though the test appears to be highly “accurate”
(99.9% sensitivity & 99% specificity), the number
of non-Users is very large
compared to the number of Users. Under such conditions, the
count of false
positives exceeds the count of true positives. For
example, if 1,000 individuals are tested, we expect 995
non-Users and 5 Users. Among the 995 non-Users, we expect
0.01 x 995 ≈ 10 false positives.
Among the 5 Users, we expect 0.99 x 5 = 5 true positives.
So, out of 15 positive tests, only 5 (33%) are genuine. The
test cannot be used to screen the general population for
Users.
What are the effects of improving “accuracy” of the test? If
sensitivity were
increased to 100%,
and specificity
remained at 99%,
p(User|+) = 33.44%,
a minuscule improvement. Alternatively, if sensitivity
remains at 99.9%
and specificity is increased to 99.5%,
then p(User|+) = 50.10%,
and half the positive tests are reliable. The test remains
unreliable.
How can testing be improved? If
sensitivity and specificity remain unchanged at 0.999
and 0.99 respectively, but in the population of
interest the incidence of users increases to 0.1, p(User|+)
= 0.91736, and the test is reasonably reliable (but
not at a 95% criterion). Alternatively, testing may be
applied as a population screen, and any individual who tests
positive may be re-tested: the probability
that any non-user will fail the test twice
is (0.001)2, but the probability
that a user will escape detection twice is
only (0.01)2.
HOMEWORK: Write an Excel spreadsheet program to
calculate p(User|+) for various
values of Sensitivity, Specificity, and Incidence.
Use the base values above as a starting point. Under what
circumstances is the test most “useful”? Explain.