Covid-19 Data in the US Is an ‘Information Catastrophe’

The order to reroute CDC hospitalization figures raised accuracy concerns. But that's just one of the problems with how the country collects health data.
arrows
Photograph: Jorg Greuel/Getty Images

Two weeks ago, the Department of Health and Human Services stripped the Centers for Disease Control and Prevention of control of national data on Covid-19 infections in hospitalized patients. Instead of sending the data to the CDC’s public National Healthcare Safety Network (NHSN), the department ordered hospitals to send it to a new data system, run for the agency by a little-known firm in Tennessee.

The change took effect immediately. First, the hospitalization data collected up until July 13 vanished from the CDC’s site. One day later, it was republished—but topped by a note that the NHSN Covid-19 dashboard would no longer be updated.

Fury over the move was immediate. All the major organizations that represent US public health professionals objected vociferously. A quickly written protest letter addressed to Vice President Mike Pence, HHS secretary Alex Azar, and Deborah Birx, the coordinator of the White House’s Coronavirus Task Force, garnered signatures from more than 100 health associations and research groups. The reactions made visible the groups’ concerns that data could be lost or duplicated, and underlined their continual worry that the CDC is being undercut and sidelined. But it had no other effect. The new HHS portal, called HHS Protect, is up and running.

Behind the crisis lies a difficult reality: Covid-19 data in the US—in fact, almost all public health data—is chaotic: not one pipe, but a tangle. If the nation had a single, seamless system for collecting, storing, and analyzing health data, HHS and the Coronavirus Task Force would have had a much harder time prying the CDC’s Covid-19 data loose. Not having a comprehensive system made the HHS move possible, and however well or badly the department handles the data it will now receive, the lack of a comprehensive data system is harming the US coronavirus response.

“Every health system, every public health department, every jurisdiction really has their own ways of going about things,” says Caitlin Rivers, a senior scholar at the Johns Hopkins Center for Health Security. “It's very difficult to get an accurate and timely and geographically resolved picture of what's happening in the US, because there's such a jumble of data.”

Data systems are wonky objects, so it may help to step back and explain a little history. First, there’s a reason why hospitalization data is important: Knowing whether the demand for beds is rising or falling can help illuminate how hard-hit any area is, and whether reopening in that region is safe.

Second, what the NHSN does is important too. It’s a 15-year-old database, organized in 2005 out of several streams of information that were already flowing to the CDC, which receives data from hospitals and other health care facilities about anything that affects the occurrence of infections once someone is admitted. That includes rates of pneumonia from use of ventilators, infections after surgery, and urinary tract infections from catheters, for instance—but also statistics about usage of antibiotics, adherence to hand hygiene, complications from dialysis, occurrence of the ravaging intestinal infection C. difficile, and rates of health care workers getting flu shots. Broadly, it assembles a portrait of the safety of hospitals, nursing homes, and chronic care institutions in the US, and it shares that data with researchers and with other statistical dashboards published by other HHS agencies such as the Center for Medicare and Medicaid Services.

Because NHSN only collects institutional data, and Covid-19 infections occur both inside institutions such as nursing homes and hospitals, and in the outside world, HHS officials claimed the database was a bad fit for the coronavirus pandemic. But people who have worked with it argue that since the network had already devised channels for receiving all that data from health care systems, it ought to continue to do so—especially since that data isn’t easy to abstract.

“If you are lucky enough to work in a large health care system that has a sophisticated electronic medical record, then possibly you can push one button and have all the data flow up to NHSN,” says Angela Vassallo, an epidemiologist who formerly worked at HHS and is now chief clinical adviser to the infection-prevention firm Covid Smart. “But that’s a rare experience. Most hospitals have an infection preventionist, usually an entire team, responsible for transferring that data by hand.”

There lies the core problem. Despite big efforts back during the Obama administration to funnel all US health care data into one large-bore pipeline, what exists now resembles what you’d find behind the walls of an old house: pipes going everywhere, patched at improbable angles, some of them leaky, and some of them dead ends. To take some examples from the coronavirus response: Covid-19 hospital admissions were measured by the NHSN (before HHS intervened), but cases coming to emergency departments were reported in a different database, and test results were reported first to local or state health departments, and then sent up to the CDC.

Covid-19 data in particular has been so messy that volunteer efforts have sprung up to fix it. These include the COVID Tracking Project—compiled from multiple sources and currently the most comprehensive set of statistics, used by media organizations and apparently by the White House—and Covid Exit Strategy, which uses data from the COVID Tracking Project and the CDC.

Last week, the American Public Health Association, the Johns Hopkins Center, and Resolve to Save Lives, a nonprofit led by former CDC director Tom Frieden, released a comprehensive report on Covid-19 data collection. Pulling no punches, they called the current situation an “information catastrophe.”

The US, they found, does not have national-, state-, county-, or city-level standards for Covid-19 data. Every state maintains some form of coronavirus dashboard (and some have several), but every dashboard is different; no two states present the same data categories, nor visualize them the same way. The data presented by states is “inconsistent, incomplete, and inaccessible,” the group found: Out of 15 key pieces of data that each state should be presenting—things such as new confirmed and probable cases, new tests performed, and percentage of tests that are positive—only 38 percent of the indicators are reported in some way, with limitations, and 60 percent are not reported at all.

“This is not the fault of the states—there was no federal leadership,” Frieden emphasized in an interview with WIRED. “And this is legitimately difficult. But it’s not impossible. It just requires commitment.”

But the problem of incomplete, messy data is older and deeper than this pandemic. Four scholars from the health-policy think tank the Commonwealth Fund called out the broader problem just last week in an essay in The New England Journal of Medicine, naming health data as one of four interlocking health care crises exposed by Covid-19. (The others were reliance on employer-provided health care, financial losses in rural and primary-care practices, and the effect of the pandemic on racial and ethinic minorities.)

“There is no national public health information system—electronic or otherwise—that enables authorities to identify regional variation in the demand for, and supply of, resources critical to managing Covid-19,” they wrote. The fix they recommended: a national public health information system that would record diagnoses in real time, monitor the materials hospitals need, and link hospitals and outpatient care, state and local health departments, and laboratories and manufacturers to maintain real-time reporting on disease occurrence, preventive measures, and equipment production.

They are not the first to say this is needed. In February, 2019, the Council of State and Territorial Epidemiologists launched a campaign to get Congress to appropriate $1 billion in new federal funding over 10 years specifically to improve data flows. “The nation’s public health data systems are antiquated, rely on obsolete surveillance methods, and are in dire need of security upgrades,” the group wrote in its launch statement. “Sluggish, manual processes—paper records, spreadsheets, faxes, and phone calls—still in widespread use, have consequences, most notably delayed detection and response to public health threats.”

Defenders of the HHS decision to switch data away from the CDC say that improving problems like that is what the department was aiming for. ("The CDC's old hospital data-gathering operation once worked well monitoring hospital information across the country, but it's an inadequate system today," HHS assistant secretary for public affairs Michael Caputo told CNN.) If that’s an accurate claim, during a global pandemic is a challenging time to do it.

“We were opposed to this, because trying to do this in the middle of a disaster is not the time,” says Georges Benjamin, a physician and executive director of the American Public Health Association, which was a signatory to the letter protesting moving data from the NHSN. “It was just clearly done without a lot of foresight. I don’t think they understand the way data moves into and through the system.”

The past week has shown how correct that concern was. Immediately after the switch, according to CNBC, states were blacked out from receiving data on their own hospitals, because the hospitals were not able to manage the changeover from the CDC to the HHS system. On Tuesday, Ryan Panchadsaram, cofounder of Covid Exit Strategy and former deputy chief technology officer for the US, highlighted on Twitter that data on the HHS dashboard, advertised as updating daily, was five days old. And Tuesday night, the COVID Tracking Project staff warned in a long analysis: “Hospitalization data from states that was highly stable a few weeks ago is currently fragmented, and appears to be a significant undercount.”

When the Covid-19 crisis is over, as everyone hopes it will be someday, the US will still have to wrestle with the questions it raised. One of those will be how the richest country on the planet, with some of the best clinical care in the world, was content with a health information system that left it so uninformed about a disease affecting so many of its citizens. The answer could involve tearing the public-health data system down and building it again from scratch.

“This is a deeply entrenched problem, where there is no single person who has not done their job,” Rivers says. “Our systems are old. They were not updated. We haven’t invested in them. If you’re trying to imagine a system where everyone reports the same information in the same way and we can push a button and have all the information we might want, that will take a complete overhaul of what we have.”


More Great WIRED Stories