Introduction to Bayes Theorem
Conventional statistics rely on a Probabilistic
Model of events, such as the probability (p)
that one will draw an Ace from a deck of cards (p
= 4/52) or roll Boxcars with two dice (p = 1/36). The
joint probability of drawing an Ace AND rolling
boxcars in then simply p' = (1/13)(1/36) = 0.00214.
This can be extended to biological situations, for example
that the next hospital patient you see will be male and (or)
have hemophilia, based on data that about half the
population is male, and that a certain fraction of the
population has hemophilia. The probabilistic model is
already complicated by the recognition that hemophilia is
typically (but not always) a male trait, and further that in
a hospital ward, there will be a higher proportion of
hemophiliacs than in the outside population. Note that the
probabilistic approach will be different when applied to
optometric patients who are male and (or) color-blind. A
probabilistic approach may fail under these circumstances.
Alternatively, the Bayes Model is
concerned with the likelihood of events, which
explicitly considers the co-occurrence of events,
especially where those events are not independent.
This is phrased as, What is the
probability of event A, given that event B also occurs?
Bayes’ Theorem is stated
mathematically as
p(A|B) = [ p(B|A) x p(A) ] / p(B)
where A & B are events, and
p(B) ≠ 0. An event is something
that can be true or false, for example, that a person is
color blind, or male.
p(A|B) and p(B|A) are conditional probabilities,
the likelihood of
event A
occurring, given that B is true, and v.v. Read p(A|B)
as the probability
of A given B.
p(A) and p(B) are the marginal probabilities
of observing A
and B,
independently of each other: for example, the proportion of
color blind people, or males.
Among other
uses, Bayes’ Theorem provides an improved method of
assessing the likelihood of two non-independent
events occurring simultaneously.
Example: Sensitivity & Specificity of
Drug Testing
Suppose a urine test used to detect the presence of a
particular banned drug is 99.9%
sensitive and 99.0% specific. That
is, the test will provide 99.9% true positive results
for drug users, and 99% true negative results for non-users. Suppose
further than 0.5% of the population tested are drug
users (incidence).
We ask: What is the
probability that an individual who tests positive is a
user? Bayes’ Theorem phrases this as, what is p(User|+) ?
Let p(A) = p(User)
and p(B) = p(+), then
p(User|+) = [ p(+|User) x p(User) ] / p(+)
Here, p(+|User) estimates sensitivity, that 0.999 of Users tested
will be detected, and [1 - p(+|Non-User)]
incorporates specificity, that only
(1 – 0.99) = 0.01 of Non-Users will be
reported (incorrectly) as Users.
Then, p(+) estimates the total number of positive tests,
including true as well as false positives. These two
components are
p(+)
= [ p(+|User) x p(User) ]
+ [ p(+|Non-User) x p(Non-User)
]
Keeping the
same number formats as defined above
So that
p(User|+) = [ p(+|User) x p(User) ] / p(+) = (0.999 x 0.005) / [(0.999)(0.005) + (1 - 0.99)(1 - 0.005)] = 0.3342
That is, even if an individual tests positive, it is twice
as likely as not (1 – 33.42% = 66.58%) that s/he
is not a User.
Why? Even though the test appears to be highly “accurate”
(99.9% sensitivity & 99% specificity), the number
of non-Users is very large
compared to the number of Users. Under such conditions, the
count of false
positives exceeds the count of true positives. For
example, if 1,000 individuals are tested, we expect 995
non-Users and 5 Users. Among the 995 non-Users, we expect
0.01 x 995 ≈ 10 false positives.
Among the 5 Users, we expect 0.99 x 5 = 5 true positives.
So, out of 15 positive tests, only 5 (33%) are genuine. The
test cannot be used to screen the general population for
Users.
What are the effects of improving “accuracy” of the test? If
sensitivity were
increased to 100%,
and specificity
remained at 99%,
p(User|+) = 33.44%,
a minuscule improvement. Alternatively, if sensitivity
remains at 99.9%
and specificity is increased to 99.5%,
then p(User|+) = 50.10%,
and half the positive tests are reliable. The test remains
unreliable.
How can testing be improved? If
sensitivity and specificity remain unchanged at 0.999 and
0.99 respectively, but in the population of interest the
incidence of users increases to 0.1, p(User|+)
= 0.91736, and the test is reasonably reliable (but
not at a 95% criterion). Alternatively, testing may be
applied as a population screen, and any individual who tests
positive may be re-tested: the probability
that any non-user will fail the test twice
is (0.001)2, but the probability
that a user will escape detection twice is
only (0.01)2.
HOMEWORK: Write an Excel spreadsheet program to
calculate p(User|+) for various
values of Sensitivity, Specificity, and Incidence. Use the
base values above as a starting point. Under what
circumstances is the test most “useful”? Explain.