where and are events and .
The basic notion is that if we believe that the
expectation of an event of interest (A) is influenced by
another event (B), we can improve pn the simple probability
expectation of A by incorporating information about
the B event. Thus we get an improved likelihood expectation
of A.
Suppose a blood test used to
detect the presence of a particular banned sports drug is 99%
sensitive
and 99% specific.
That is, the test will produce 99% true positive results
for drug users and 99% true negative results for
non-drug users. Suppose that 0.5% of athletes are users
of the drug. What is the likelihood that a randomly
selected athlete who tests positive is a user?
Intuitively, this is the sensitivity of the test in the
numerator, P = 0.99. However, we also
know that the test is sometimes non-specific and
returns a false positive, at a rate (1.00 - 0.99) = 0.01
in the denominator. Then:
Even if an individual tests positive, it is more likely than not (1 - 33.2% = 66.8%) that they do not use the drug. Why? Even though the test appears to be highly accurate, the number of non-users is very large compared to the number of users. Then, the count of false positives will be greater than the count of true positives.
To see this with actual
numbers: In a test group of 1,000 individuals, we expect 995
non-users and 5 users. Among the 995 non-users,
0.01 × 995 ≃ 10 false positives are
expected. Among the 5 users,
0.99 × 5 ≈ 5 true positives are
expected. Out of 15 positive results, only 5 (~33%), are
genuine.
The importance of specificity
in this example can be seen by calculating that even if
sensitivity is improved to 100%, but specificity
remains at 99%, then the probability that a
person who tests positive is a drug user only rises only very
slightly, from 33.2% to 33.4%. Alternatively, if sensitivity
remains 99%, but specificity is improved to 99.5%,
then the probability that a person who tests positive is a
drug user rises to about 49.9%.
HOMEWORK
1) Prove the
statements in the last paragraph about the consequences of
changing specificity & sensitivity.
1) IOC decisions must
allow for athletes who positive but protest their innocence.
We hear in the news that "The test is being redone:"
(A)
What is the a priori probability of two
false positives in a row?
(B)
What are the implications if all athletes are required
to take two tests?
2) Suppose that education and
better screening in the home country reduces the fraction of
users by one-half (0.25%). How will this modify
the Bayes estimate that a positive test identifies a
user?