Is it better to improve sensitivity or specificity?
Here’s a slightly unusual exercise on the topic of Bayes’ Theorem for those of you teaching or studying introductory probability. Imagine that you’re developing a diagnostic test for a disease. The test is very simple: it either comes back positive or negative. You have a choice between slightly increasing either your test’s sensitivity or its specificity. If your goal is to maximize the positive predictive value (PPV) of your test, i.e. the probability that a patient has the disease given that the test comes back positive, which test characteristic should you choose to improve?
An Open Invitation
If you’re still hungry for more Bayes’ Theorem after reading this post, then why not join the Summer of Bayes 2024 online reading group? If you’d like to be added to the mailing list, just send an email to bayes [at] user.sent.as
. Recordings of past sessions along with slides and other materials are available to group members via the Summer of Bayes discussion board. And now back your regularly-scheduled blog content…
Odds aren’t so odd!
While I give you a few minutes to pause and ponder this question, here’s a brief rant on the topic of odds. If you’re anything like me, the first time you encountered odds, you thought to yourself
What is this $*@%^!? Why would anyone want to spoil a perfectly good probability by dividing it by one minus itself?“1
But it’s time to take the red pill and see the world as it really is: the only reason you prefer to think in terms of probabilities rather than odds is because you’ve been brainwashed by the educational system. Of course I exaggerate slightly, but the point is that odds are just as natural as probabilities; we’re just not as accustomed to working with them. In many situations in probability, statistics, and econometrics, it turns out that working with odds (or their logarithm) makes life much simpler, as I will try to convince you with a simple example.
First we need to define odds. Consider some event \(A\) with probability \(p\) of occurring. Then we say that the odds of \(A\) are \(p/(1 - p)\). For example, if \(p = 1/3\) then the event \(A\) is equivalent to drawing a red ball from an urn that contains one red and two blue balls: the probability gives the ratio of red balls to total balls. The odds of \(A\), on the other hand, equal \(1/2\): odds give the ratio of red balls to blue balls. Since probabilities are between 0 and 1, odds are between 0 and \(\infty\). Odds of 0 mean that the event is impossible, while odds of \(\infty\) mean that the event is certain. Odds of 1 mean that the event is just as likely to occur as not to occur.
Now here’s an example that you’ve surely seen before:
One in a hundred women has breast cancer \((B)\). If you have breast cancer, there is a 95% chance that you will test positive \((+)\); if you do not have breast cancer \((B^C)\), there is a 2% chance that you will nonetheless test positive \((+)\). We know nothing about Alice other than the fact that she tested positive. How likely is it that she has breast cancer?
It’s easy enough to solve this problem using Bayes’ Theorem, as long as you have pen and paper handy: \[ \begin{aligned} P(B | +) &= \frac{P(+|B)P(B)}{P(+)} = \frac{P(+|B)P(B)}{P(+|B)P(B) + P(+|B^C)P(B^C)}\\ &= \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.02 \times 0.99} \approx 0.32. \end{aligned} \] But what if I asked you how the result would change if only one in a thousand women had breast cancer? What if I changed the sensitivity of the test from 95% to 99% or the specificity from 98% to 95%? If you’re anything like me, you would struggle to do these calculations in your head. +hat’s because \(P(B|+)\) is a highly non-linear function of \(P(B)\), \(P(+|B)\), and \(P(+|B^C)\).
In contrast, working with odds makes this problem a snap. The key point is that \(P(B|+)\) and \(P(B^C|+)\) have the same denominator, namely \(P(+)\): \[ P(B | +) = \frac{P(+|B)P(B)}{P(+)}, \quad P(B^C | +) = \frac{P(+|B^C)P(B^C)}{P(+)} \] Notice that \(P(+)\) was the “complicated” term in \(P(B|+)\); the numerator was simple. Since the odds of \(B\) given \((+)\) is defined as the ratio of \(P(B|+)\) to \(P(B^C|+)\), the denominator cancels and we’re left with \[ \text{Odds}(B|+) \equiv \frac{P(B|+)}{P(B^C|+)} = \frac{P(+|B)}{P(+|B^C)} \times \frac{P(B)}{P(B^C)}. \] In other words, the posterior odds of \(B\) equal the likelihood ratio, \(P(+|B)/P(+|B^C)\), multiplied by the prior odds of \(B\), \(P(B)/P(B^C)\): \[ \text{Posterior Odds} = \text{(Likelihood Ratio)} \times \text{(Prior Odds)}. \] Now we can easily solve the original problem in our head. The prior odds are 1/99 while the likelihood ratio is 95/2. Rounding these to 0.01 and 50 respectively, we find that the posterior odds are around 1/2. This means that Alice’s chance of having breast cancer is roughly equivalent to the chance of drawing a red ball from an urn with one red and two blue balls. There’s no need to convert this back to a probability since we can already answer the question: it’s considerably more likely that Alice does not have breast cancer. But if you insist, odds of 1/2 give a probability of 1/3, so in spite of rounding and calculating in our heads we’re within 0.3% of the exact answer!
Repeat after me: odds are on a multiplicative scale. This is their key virtue and the reason why they make it so easy to explore variations on the original problem. If one in a thousand women has breast cancer, the prior odds become 1/999 so we simply divide our previous result by 10, giving posterior odds of around 1/20. If we instead changed the sensitivity from 95% to 99% and the specificity from 98% to 95%, then the likelihood ratio would change from \(95/2 \approx 50\) to \(99/5 \approx 20\).
The Solution
Have I given you enough time to come up with your own solution? Fantastic! In case you hadn’t already guessed, that little digression about odds served an important purpose: my solution will use odds rather than probabilities. Our goal is to increase the positive predictive value (PPV) of the test, namely \[ \text{PPV} \equiv P(\text{Has Disease}|\text{Test Positive}), \] by as much as possible, either by improving the test’s sensitivity \[ \text{Sensitivity} \equiv P(\text{Test Positive} | \text{Has Disease}) \] or its specificity \[ \text{Specificity} \equiv P(\text{Test Negative} | \text{Doesn't Have Disease}). \] To answer this question, we’ll start by substituting these definitions into the odds form of Bayes’ Theorem introduced above, yielding \[ \text{Posterior Odds} = \frac{\text{PPV}}{1 - \text{PPV}} = \frac{\text{Sensitivity}}{1 - \text{Specificity}} \times \text{Prior Odds}. \] This expression makes it clear that increasing either the sensitivity or specificity of the test increases the posterior odds. And because the PPV is a strictly increasing function of the posterior odds, namely \[ \text{PPV} = \frac{\text{Posterior Odds}}{1 + \text{Posterior Odds}}, \] this also increases the PPV. So now the question is: which of these two possibilities gives us the most bang for our buck? A natural idea would be to compare the marginal effect of increasing sensitivity by a small amount to the marginal effect of increasing specificity by the same amount. We can do this by comparing the partial derivatives of the PPV with respect to sensitivity and specificity. But, again, the PPV is an increasing function of the posterior odds, so we can simplify our task by comparing the derivatives of the posterior odds with respect to sensitivity and specificity. By the chain rule, any claim about the relative magnitudes of these derivatives computed for the odds will also hold for the PPV.
But why stop with the odds? We can simplify our task even further by comparing the derivatives of the logarithm of the posterior odds with respect to sensitivity and specificity. This is because the logarithm is, again, an increasing transformation of the odds. Since \[ \log(\text{Posterior Odds}) = \log(\text{Sensitivity}) - \log(1 - \text{Specificity}) + \log(\text{Prior Odds}). \] our required derivatives are \[ \frac{\partial \log(\text{Posterior Odds})}{\partial \text{Sensitivity}} = \frac{1}{\text{Sensitivity}} \quad \text{and} \quad \frac{\partial \log(\text{Posterior Odds})}{\partial \text{Specificity}} = \frac{1}{1 - \text{Specificity}}. \] Now for the punchline: the ratio of the derivative with respect to specificity divided by that with respect to sensitivity is \[ \frac{\partial \log(\text{Posterior Odds})/\partial \text{Specificity}}{\partial \log(\text{Posterior Odds})/\partial \text{Sensitivity}} = \frac{1/(1 - \text{Specificity})}{1/\text{Sensitivity}} = \frac{\text{Sensitivity}}{1 - \text{Specificity}} \] and this is precisely the likelihood ratio from the odds form of Bayes Theorem! Hence, whenever the likelihood ratio is greater than one we’d prefer to increase the test’s specificity; whenever it’s less than one we’d prefer to increase the sensitivity. If the likelihood ratio is equal to one, then it doesn’t matter which we choose.
Case closed, right? Well not quite. We can say a bit more by thinking about what it means for the likelihood ratio to be greater than or less than one. Examining the odds form of Bayes’ Theorem from above, we see that a likelihood ratio less than one means that our posterior probability that a person is sick falls when she tests positive. In other words, this corresponds to a test that is worse than useless: it’s actually misleading. In contrast, a likelihood ratio greater than one means that the test is informative: a positive test result increases our belief that the person is sick. Any real-world diagnostic test will have a likelihood ratio greater than one. Indeed, if we had such an actively mis-leading test, we could easily convert it into an informative one by simply reversing the test’s outcome: if someone tests positive, we tell them they’re negative, and vice versa. This reversal would result in a likelihood ratio greater than one. Therefore, in all cases–whether we start with an informative test or reverse a misleading one–we should prefer to increase the test’s specificity.
Epilogue
Of course, this exercise is predicated upon the assumption that we want to maximize the PPV and that we can freely adjust both the test’s sensitivity and its specificity. In practice, one or more of these assumptions might not hold. Indeed, PPV is not the be all and end all of diagnostic testing. A full accounting would need to consider the relative costs of false positives and false negatives along with the prevalence of the disease. Still, I hope this exercise gives you a flavor of the power of odds for simplifying complex problems in probability and statistics.
I know first-hand that this sentiment is shared by at least one distinguished professor of probability theory, so at least I’m not completely alone in my earlier view of things!↩︎