World Views | Big Data Won’t Save You From Coronavirus

How often do you see a piece of economic or financial information revised upward by 45%? And how reliable would you regard a data set that’s subject to such adjustments?
This is the problem confronting epidemiologists trying to make sense of the novel coronavirus spreading from China’s Hubei province. On Thursday, the tally there surged by 45% — or 14,480 cases. The revision was largely due to health authorities adding patients diagnosed on the basis of lung scans to a previous count, which was mostly limited to those whose swab tests came back positive.
The medical data emerging from hospitals and clinics around the world are invaluable in determining how this outbreak will evolve — but the picture painted by the information is changing almost as fast as the disease itself, and isn’t always of impeccable provenance. Just as novel infections exploit weaknesses in the body’s immune defenses, epidemics have an unnerving habit of spotting the vulnerabilities of the data-driven society we’ve built for ourselves.
That’s most visible in the contradictory information we’re seeing around how many people have been infected, and what share of them have died. While those figures are essential for getting a handle on the situation, as we’ve argued, they’re subject to errors in sampling and measurement that are compounded in high-pressure, strained circumstances. The physical capacity to do timely testing and diagnosis can’t be taken for granted either.
Early case fatality rates for Severe Acute Respiratory Syndrome were often 40% or higher before settling down to figures in the region of 15% or less. The age of patients, whether they get sick in the community or in a hospital, and doctors’ capacity and experience in offering treatment can all affect those numbers dramatically.
Even the way that coronavirus cases are defined and counted has changed several times, said Professor Raina MacIntyre, head of the University of New South Wales’s Biosecurity Research Program: From “pneumonia of unknown cause” in the early days, through laboratory-confirmed cases once a virus was identified, to the current standard that includes lung scans. That’s a common phenomenon during outbreaks, she said.
Those problems are exacerbated by the fact that China’s government has already shown itself willing to suppress medical information for political reasons. While you’d hope the seriousness of the situation would have changed that instinct, the fact casts a shadow of doubt over everything we know.
While every piece of information is subject to revision and the usual statistical rule of garbage-in, garbage-out, epidemiologists have ways to make better sense of what is going on.
Well-established statistical techniques can be used to clean up messy data. A study this week by Imperial College London used screening of passengers flying to Japan and Germany to estimate the fatality rate for all cases was about 1% — below the 2.7% of confirmed ones found in Hubei province, but higher than the 0.5% seen for the rest of the world.
When studies from different researchers using varying techniques start to converge toward common conclusions, that’s also a strong if not faultless indication that we’re on the right track. The number of new infections caused by each coronavirus case has now been identified in the region of 2.2 or 2.3 by several separate studies, for instance — although that number itself can be subject to change as people quarantine themselves and self-segregate to prevent infection.
The troubling truth, though, is that in a society that expects to know everything, this most crucial piece of knowledge is still uncertain. David Fickling, Bloomberg 

Categories World