Why coronavirus exposure tests should be randomized.


Rows of nasal swabs on a red background.
Photo illustration by Slate. Photo by Ben Hasty / MediaNews Group / Reading Eagle via Getty Images.
  1. How not to lose the COVID-19 communications war

  2. Dawn of the Shout drone

  3. Why in the name of God is Fintech not better regulated?

  4. France restricts Amazon delivery during pandemic

It is fair to say that each of us would like to see progress against COVID-19 and a return to normalcy. And if you follow the news, you will hear that testing is the key. Test more! Test better! Different types of tests! Home tests, driving tests, more test centers. Antibody tests!

And the tests have indeed increased. More and more states are testing sick people and some have gone beyond. Tennessee has announced that everyone can now be tested for coronavirus, regardless of symptoms.

Although the diagnosis of a sick person does not change their course of treatment, especially for a mild case, it informs the measures of self-isolation and helps to monitor the spread of the disease. But we still don’t do enough of the basics: really random testing. To see why this is so crucial, it is useful to start with one of the main open questions of the pandemic: what proportion of people are already exposed?

There are many unanswered questions with COVID-19 – how far it travels in the air, how to treat it better, why certain groups and people are so much more affected than others. But among these threats hides a more fundamental question in the background: how widespread is the virus? This is an extremely important question, but it is also very difficult to resolve. Why?

Many of our predictions about the path of the virus over the next few months (and the world, the economy, etc.) are based on epidemic modeling. Many of these models are “SIR” – “susceptible-infected-recovered” forms of the model that plot the dynamics as a population changes from fully vulnerable to viruses to infected and ultimately to recovered.

The basic structures of the models are mostly similar, but depending on the numbers you insert, they give very different answers. We’ve seen it in the way forecasts for future hospitalizations and deaths have changed in the past month.

There are many reasons for this, but they mainly boil down to the fact that these models are growing exponentially. This means that small differences multiply rapidly over time, so small changes in assumptions about the spread of the disease will cause huge differences in projections in a few weeks.

To improve the models, both to determine which ones are suitable and to improve the best, we need to adapt them to the data. It actually means knowing what proportion of people are susceptible, infected or recovered at any given time. Without this information, we are only guessing.

You may be thinking: We surely know! Don’t we see information on infections, hospitalizations and deaths over time? I feel like I’ve seen a lot of graphics on this.

Well yes. But in the context of COVID-19, it’s not close enough. Many COVID-19 infections are mild and non-specific, which means that people do not realize they have the disease or that they are not sick enough to see a doctor. A large proportion of people – perhaps half, even 75 percent – who are infected have no symptoms. Even symptomatic people are not yet often tested. The number of cases is almost meaningless given the variation in tests over time and space, and the fact that even in the best places studied in the United States, the tests are incomplete.

This means that for each case that we see, there are at least some that we do not see. How much is really unclear. Some people think there are 10 missing cases for each one we see; others think it’s only one or two.

The implications of these two views are extremely different. If 1% of the population has already been infected, 99% of people are still susceptible. On the other hand, if 20% have already been infected, well, that’s another story.

One of our main priorities should be to know this number. And that’s where I considered the selection issues.

The best way to find out the share of the population that has been exposed to the virus is to either test everyone (best case, but probably not possible in the United States) or to test a random sample of people. These tests can include a current active infection or a past infection with antibodies. (This antibody test has started going online in the past two weeks and promises to be even more useful than the active infection test.)

Whichever type of test we use, the best information will come from testing a random sample of people. Being random, it is representative of everyone, so it allows us to know what we expect from the general population.

To date there are a few examples of this type of pandemic testing – very few. Iceland recently performed random population tests, which showed that around 1% of the general population was actively infected (half of them being asymptomatic). There is a city in Italy that tested everyone at the start of the epidemic (3% active infection, about half asymptomatic). An antibody test (which identifies present and past infections) in a random sample in Germany showed that 15 percent had been, actively or in the past, infected.

If we do not understand the biases in our sampling, the resulting data is garbage.

The second best choice for a random sample may be the universal test in a known population. We had a recent example among pregnant women in New York. Earlier this week in the New England Journal of Medicine showed active COVID-19 infection in nearly 15% of women admitted to childbirth in a New York hospital.

This is not as good as a random sample, because pregnant women are different in many ways (gender, age, exposure to medical care) from the general population. Yet it has value – partly because we can understand the sources of bias.

I would say a similar thing about the plans recently announced by Major League Baseball to test, fundamentally, its entire workforce. Yes, this is not a random set of people. But if they really achieve something close to the universal, we can at least have a very good understanding of How? ‘Or’ What the sample is selected.

Most people agree that random or universal testing is the best approach. But it is also very difficult to execute. Identifying a random sample of people and testing them is much, much more difficult than testing what we would call a “convenience sample” – people that are easy to find and access. Let’s say you are governor and want to test your population at random, either with oral swabs or blood tests. You would need to send someone (the National Guard? Nurses?) To people to ask them to donate blood or have a cotton swab deep in the nose. Many might say no. People will complain! Beyond that, it is well known that the response to the survey varies with age, race and ethnicity, which would be a problem for a random sample. It’s a logistical mess.

Given the difficulty, you might be tempted to think: Well, some data is better than no data. I’m going to do something simpler – maybe create a mobile test site and encourage people to come – and at least I’ll learn Something.

This thinking is really problematic. Simply put: if we don’t understand the biases in our sampling, the resulting data is garbage. A recent frustrating example of this is a large study by the National Institutes of Health which aims to do antibody tests among 10,000 volunteers to measure the prevalence of undetected infections. Volunteers are approached in a variety of ways, such as on Twitter and other public publications. People are encouraged to email NIH to register, in which case they may receive a home test kit.

Anthony Fauci suggested that this would give us “a clearer picture of the true magnitude of the COVID-19 pandemic in the United States”. But it will not be the case! This will give a clear picture of the breadth of people who, for example, scroll through Twitter to look for opportunities in studies like this. Are these people more or less likely to have had COVID-19? I have no idea. You may be drawing more people who know they have been exposed (higher prevalence), or perhaps you are drawing people who are more careful about exposure (lower prevalence). It may be a strange mixture of the two. We just don’t know. We will remove a number of them, and it will be completely uninterpretable.

It’s worse than nothing because people will think they have learned something.

I have similar problems with the blood donor test as a measure of prevalence. Yes, it’s convenient. But that won’t tell us anything useful.

What to do? I’m afraid that despite the difficulty, we simply have no choice but to do better sampling when we test. As a person who is trying to take off random tests in various populations, I can attest to the many, many challenges of doing so. But it’s worth it. We need to.

What can you do, other than telling all your friends that random testing is great? By far the most important: If someone shows up at your door and tells you that you have been randomly selected for a test, please consent.

A version of this article was first published in the Emily Oster newsletter, ParentData.

Future Tense
is a partnership of
New America, and
Arizona State University
which examines emerging technologies, public policy and society.


Please enter your comment!
Please enter your name here