Research News

Bayes’ Theorem: How Mathematicians Ruined Population Testing for Everyone

  • 30 April, 2020


As governments begin to lift lockdown restrictions, it will be important to scale Covid-19 testing to efficiently identify and isolate people as and when they become infected. Unfortunately, even if we can deliver testing at scale –– which means lots of fast, accurate tests, and the resources to deploy them –– the mathematics of testing can conspire to undermine even the most robust test system, because even very accurate tests can generate an abundance of false positives, if the condition being tested for is rare.

The explanation for this comes from ​Bayes’ Theorem, an important mathematical result dating back to the 1700’s, and named after Reverend Thomas Bayes. For our purposes, Bayes’ Theorem provides a framework for reasoning about the results of tests applied to large populations of individuals. Consider a hypothetical Covid-19 test, which the manufacturer tells us is 98% accurate. There are two sides to this accuracy figure: the ability to detect people with the virus is called ​sensitivity​ and the ability to detect people who are clear is called ​specificity​. It turns out that specificity is really important in mass-testing for a condition that is rare in the population. If only 5% of the population have Covid-19 (the ​base rate​) then a test with a specificity of 98% does not play out well, as we will see.

Bayes’s Theorem provides a way to translate this test accuracy information (e.g. 98%) into infection probabilities, so that we can, for example, calculate the likelihood that a person is actually infected with Covid-19 if they receive a positive test result; spoiler, this probability is not 98%. We could use Bayes Theorem to do the calculations but it is easier to see what is going on if we work through the example of 10,000 tests, and a base rate of 5%, as summarised in the table below.

In total there are 500 people with Covid-19, but our imperfect test returns a positive result for 98% (490) of these, and a negative result for the remaining 10. For the 9,500 who are not infected, our imperfect test correctly returns a negative result for 98% (9,310) and an incorrect positive result for 190.

Table 1. Testing results from a population of 10,000 people with a base infection rate of 5% and a test accuracy (sensitivity and specificity) of 98%.

If a test returns a positive result, then how likely is it that the individual is actually infected? We can calculate this as the fraction of people with positive tests who are actually infected; this is the number of infected people who receive a positive result (490) divided by the total number of people who receive a positive result (680), or 0.72 overall. In other words, if you receive a positive result from our hypothetical test, then there is only a 72% chance that you are really infected. This indicates a 28% ​false positive rate ​– people who test positive but who are not infected – which is problematic if a positive test result leads to quarantining.

All of this is to say that we need to understand that population testing is far from fool-proof, even when using tests with very high accuracy levels. How rare a condition is in the population (the base rate) also plays a critical role. In the example above, we assumed a base rate of 5%, and while we don’t have very good information about the base rate of Covid-19, currently, it could be lower or higher than this. This will change false positive probabilities as shown in the chart below, for base rates up to 25% (0.25). For example, suppose the base rate is only 2%, then using our 98% accurate test will mean a false positive rate of 50%; that is, 50% of people who test positive will ​not​ be infected. If the base rate is higher, then things will improve; a 10% base rate suggests a false positive rate of just over 15%, as shown.

It is probably unlikely that the base rate is much higher than 10% in the wider population at the moment, so what can be done to lower this false positive rate? One option is to use a more accurate test. For example, the chart below shows the difference in false positive rates, for a base rate of 5% (0.05), when we use a 98% accurate test vs. a 99.5% accurate test. As discussed above, the less accurate test produces 28% false positives – 28% of those receiving a positive test are actually not infected – but the more accurate test produces only 8% false positives.

To put all of this into perspective, at the time of writing The Irish Health Information and Quality Authority (HIQU) has published a ​comprehensive analysis of diagnostic technologies for the detection of Covid-19 along with test accuracy information (sensitivity and specificity) where available. Many of these tests have accuracy values >0.95 with some >0.99, which may indicate that it is feasible to implement a testing system that is associated with lower numbers of false positives (and false negatives). Moreover, it may be feasible to use less accurate tests in order to quickly screen people for more accurate (and more time-consuming and/or expensive) tests as part of a multi-stage testing system.

As we seek to emerge from lockdown the focus will increasingly be on testing; we see from the analysis here that testing at scale will introduce a whole new set of challenges.