COVID-19: Dealing with Gaps in the Data

A municipality employee disinfects a street amid concerns over the spread of the novel coronavirus at a downtown area in Yangon, Myanmar. Photo: EPA-EFE/LYNN BO BO

Since the data available on the coronavirus pandemic is patchy and incomplete, it needs to be approached with caution and an awareness of what it can – and cannot – tell us about the deadly virus.

One of the major problems with the COVID-19 pandemic is the speed at which the contagion spreads. This makes treating infected people much more difficult to manage, but also severely hinders our ability to have an up-to-date, thorough and trustworthy picture of the situation in Europe and the rest of the world.

The information we rely on is approximate and often errs on the side of caution (for example, the number of infected people, or deaths caused by the pandemic). It’s important to be aware of these limitations, and approach the data with caution, even if this data is the best we have, given the present circumstances. Of all official data on the global situation, that produced by the European Centre for Disease Prevention and Control (ECDC) is considered among the most reliable. Nevertheless, new and more accurate studies are emerging every day, providing additional data to help understand the pandemic and its course of development.

How many are really infected?

We don’t know. What we do know is the number of confirmed infections – individuals testing positive for the virus – and highly approximate estimates of total infections.

The test for the virus involves taking a sample of saliva or mucus, which is then analysed for traces of the virus’ genetic code. The number of people being tested varies widely from country to country: depending, above all, on how well-equipped a country is to perform large-scale testing (often it’s not the kits that are lacking, but the personnel and laboratories required to analyse huge quantities of swabs). In certain countries, authorities decide to focus on people already showing symptoms associated with COVID-19, or even just those who are already hospitalised. We know, however, that many who have contracted the virus do not show any symptoms, or only start to show symptoms many days after being infected.

The percentage of infected people accounted for in the data varies widely from country to country. This makes it difficult to compare the development of the pandemic in different times and places. For example, Italy has performed around 3500 tests for every million inhabitants, compared to 6100 in South Korea, and 600 in Spain. According to an estimate attempted by the Centre for the Mathematical Modelling of Infectious Diseases, in the London School of Hygiene & Tropical Medicine, Italy and Spain may have only recorded 5 percent of people actually infected.

How many have really died?

This is also unknown, even if the number of deaths can be estimated with more precision than cases of infection.

What we do know is the number of deaths attributed to COVID-19 (unfortunately, the criteria for attribution are not yet internationally standardised). However, we cannot be sure that all deaths caused by the coronavirus have been recorded: in the most heavily hit areas of Italy, indications suggest that tests are not performed on all victims (many of those who die at home or in retirement homes, for example). Moreover, authoritarian regimes such as China and Iran may have an interest in publishing incomplete data in order to downplay the severity of the problem – thus the number of deaths caused by the pandemic may very well be higher than suggested by official counts.

How deadly is COVID-19?

No certainty here either. The relative danger of a disease can be measured by its case fatality rate – the number of deaths as a proportion of those infected – or the mortality rate, which measures the number of deaths as a proportion of the population. A case fatality rate of 4 percent indicates that for every 100 people infected the disease causes an average of four deaths.

The available estimates of COVID-19’s case fatality rate vary all too widely according to context. On the one hand, such variations could in fact be tied to local factors: for example, the disease is likely to have a greater impact in regions or countries where the population is older or more prone to respiratory illnesses, such as heavily polluted Northern Italy. Alternatively, such variations may only be apparent, and caused by differences in how data is collected. The case fatality rate compares two figures – deaths and infections – but, as we have seen, these figures are often recorded in different ways, and often contain significant gaps.

In any case, COVID-19’s case fatality rate is an order of magnitude greater than that of more mundane viral illnesses, such as seasonal flu. The latter typically causes the death of fewer than 0.1 percent of people infected, over many months, while it is estimated that COVID-19 causes an at least twenty or thirty times higher percentage of deaths, over just a few weeks.

Two useful techniques for comparing data

Apart from the gaps and disparities in data collection, comparisons between regions and countries affected by the coronavirus are complicated by the fact that contagion didn’t start everywhere at the same time. Comparing Hubei province in China – where infection began around a month ago – with a country where contagion has just begun would not be particularly instructive. In order to compare such contexts, we should start with the day when the outbreak was registered in each area, and compare developments from there. For example, 15 days after the virus broke out in Italy, around 800 deaths had been recorded there, while in Spain, 15 days after the virus was detected in its territory, 2000 deaths had been recorded.

Another way to compare developments in countries with different data collection methods is to compare the rates of contagion in each country – for example, measuring the number of days it took for the number of confirmed deaths to double. In Germany, the figure doubled every two days, and in Italy every five days. In South Korea it has taken 13 days for the number of confirmed deaths to double, indicating that contagion has slowed down considerably.

COVID-19: Dealing with Gaps in the Data

BIRD Community