The Healthcare Analytics Summit is back! Join us live in Salt Lake City, Sept. 13-15.Register Now

How a U.S. COVID-19 Data Registry Fuels Global Research

October 15, 2020
Farhana Nakhooda

SVP Health Catalyst, Asia Pacific (APAC)

Larry Lofgreen

Asia Pacific Sales and Solutions Consulting, VP

Pol Margalef, PhD

Strategy and Business Development Consultant, Life Sciences

Praveen Deorani

Senior Data Scientist, Singapore Ministry of Healthcare

Sadiqa Mahmood, DDS, MPH

General Manager & Senior Vice President, Life Sciences Business

Article Summary


In addition to driving COVID-19 understanding within the United States, a national disease registry is informing research beyond U.S. borders. Clinicians with the Singapore Ministry of Healthcare Office for Healthcare Transformation (MOHT) have used Health Catalyst Touchstone® COVID-19 data to develop a machine learning tool that helps predict the likelihood of COVID-19 mortality.
With this national data set that leverages deep aggregated EHR data, the MOHT accessed the research-grade data it needed to build a machine-learning algorithm that predicts risk of death from COVID-19.
The registry-informed prediction model was accurate enough to stand up to comparisons in the published literature and promises to help inform vaccine research and, ultimately, allocation of vaccines within populations.

Downloads

The COVID-19 outbreak has been a significant U.S. and global concern, given the speed of spread and breadth of health impacts (both known and unknown) on the population level. The virus causes fever, cough, lack of smell, fatigue, and mild to severe respiratory complications, which, if very severe, can lead to patient death. Meanwhile, incomplete, non-transparent, and out-of-date COVID-19 data is one of the main barriers to understanding and managing the virus nationally and abroad, as well as developing a vaccine. To circumvent the lack of real-world, research-grade evidence, researchers are looking to innovative sources of comprehensive, real-time COVID-19 data.

利用深度汇总的EMR数据的全国COVID-19数据集为研究人员管理病毒和开发疫苗提供了理解的深度和广度。The Health Catalyst Touchstone® COVID-19 Registry and Insights, for example, includes de-identified data from 80 million patients across the United States and tracking data from three national sources—Johns Hopkins University, theNew York Times, andThe COVID Tracking Project. With such broad data access, data analysts can leverage data on a national scale to drive population-level insights aboutsurveillance, testing,capacity planning, and treatment response.

National-Level COVID-19 Data Powers Global Research

试金石和全国COVID-19登记处也承诺为美国境外的研究提供信息。In the summer of 2020, the SingaporeMinistry of Healthcare’s(MOH)Office for Healthcare Transformation(MOHT), in collaboration with Health Catalyst, used Touchstone COVID-19 data to develop a machine learning tool that helps predict the likelihood of COVID-19 mortality—a critical insight for driving care to highest-risk patients and managing the outbreak on a population level. To validate the accuracy of their predictive tool, Health Catalyst compared its results with results published in the literature and determined its registry-informed research aligned closely to peer-reviewed publications.

“For a rapidly evolving situation like COVID-19, medical researchers can’t rely solely on clinical trials for guidance,” explains Praveen Deorani, Senior Data Scientist, for the Singapore MOHT. “As a practical alternative to informing medical decisions, a machine learning model can generate and analyze real-world evidence much faster.”

Registry-Driven Analytics Tools Leverage COVID-19 Data for Decision Support

In an effort to assist neighboring countries that may not have the research resources available, the Singapore MOHT sought to provide analytic tools to assist in managing the pandemic. However, Singapore’s population size and the strict control measures implemented in Singapore combined to limit both the nation’s number of COVID-19 cases and the COVID-19 mortality rate, leaving a dearth ofdatato power predictive tools.

Data scientists with the Singapore MOHT evaluated detailed COVID-19 data from the Touchstone registry to identify patient factors linked to COVID-19 mortality (Figure 1).

Chart - Factors linked to COVID-19 mortality
Figure 1: Factors linked to COVID-19 mortality.

Touchstone COVID-19数据集包含168,632名不同患者的已确定数据。为了进行比较,该数据集包括了具有covid -19相关症状和诊断的患者。在这些独特的患者中,47464人至少表现出一种COVID-19相关症状,其中约21%的人COVID-19检测呈阳性。同样,数据包含26415名COVID-19检测呈阳性的患者(61%要么无症状,要么治疗机构没有记录症状)。covid -19阳性患者的covid -19相关死亡率约为3%(26415名患者中789名)。

The initial analysis effort focused on providing a triage tool for prioritizing care of patients exhibiting COVID-19-related symptoms. As Figure 2 shows, patients who tested positive for COVID-19 had different symptom distributions versus those who did not test positive. However, most patients were either asymptomatic or had no symptoms recorded. The small number of patients exhibiting loss of taste/smell is of particular interest to the MOHT, as this symptom has been seen as a strong indicator of COVID-19 in Singapore.

Chart - Symptom distribution patients with COVID-19
Figure 2: Symptom distribution for patients with COVID-19.

Despite the general lack of symptom data, when the MOHT researchers compared the correlation of symptoms to a positive COVID-19 test, two symptoms stood out: prior viral exposure and loss of taste/smell (the latter confirming what Singapore had determined through their testing regimes). Ultimately, the U.S. symptom data was too sparse to form the basis of a predictive model that could perform better than the literature-based, deterministic test result model that MOHT had already developed (Figure 3).

Chart - MOHT COVID-19 test results prediction model
图3:MOHT COVID-19检测结果预测模型。

A Data-Informed Machine Learning Tool Helps Predict Who Is Likely to Die of COVID-19

After the MOHT initial analysis efforts, the organization used factors such as age, race, gender, and comorbidities (including hypertension, cancer, and more), to produce a machine learning prediction tool to help clinicians identify COVID-19 patients at the highest risk of death (Figure 4). Some of the MOHT’s most meaningful insights include the following:

  • 死亡率因年龄组和性别而有很大差异,在某种程度上因种族而有差异。
  • Black or African Americans have higher mortality rates despite a slightly lower age.
  • 死亡率取决于与年龄分布无关的共病。
Chart - MOHT COVID-19 mortality prediction 1 of 2
Figure 4: The MOHT COVID-19 mortality prediction tool 1/2.

In contrast to the lack of symptom data captured, patient demographic and comorbidity data supported a mortality prediction model (an aggregate measure of performance across all possible classification thresholds, an AUC, of 86.7 percent). For the comorbidities in the chart above, red indicates existence of the condition, and blue indicates absence of the condition. As the values show, most comorbidities have an obvious impact on mortality risk.

However, comorbidity-based prediction is only useful if the analysts know a patient’s comorbidities. Therefore, given the observed impact of age, gender, and race in the comorbidity-based model, the MOHT data scientists created a second model using only those features likely universally available to clinicians: age, gender, race, and history of tobacco use. As Figure 5 shows, this model was performed nearly the same as the model with comorbidities (an AUC 85 percent versus the original AUC of 86.7 percent).

Chart - MOHT COVID-19 mortality prediction 2 of 2
Figure 5: MOHT COVID-19 mortality prediction tool 2/2.

The COVID-19 Mortality Prediction Model Stands up to Peer-Reviewed Literature

To verify the accuracy of the COVID-19 mortality prediction model, the MOHT reviewed published literature to compare the model’s outcomes with other research. The team determined its prediction model results were overwhelmingly consistent with other peer-reviewed studies.

The following lists offer examples of factors the MOHT model uses to predict COVID-19 mortality and some of the published literature that confirms their relationship to COVID-19 mortality:

  • Patient age—Several studies indicate patient age is a reliable predictor for COVID-19 mortality:
  • Race—Race has some mixed results as a predictor of COVID-19 death, but some studies show a correlation:
    • New York study determines an association between race with COVID-19 mortality.
    • The U.S.Black populationhas a higher rate of COVID-19 case fatality.
  • Gender—Studies associate gender with COVID-19 mortality:
    • In China,menare more at risk for having the worst COVID-19 outcomes and death.
    • A multivariate regression identifies beingmaleas a risk factor for COVID-19 mortality.
  • Cancer—Patients with COVID-19 and cancer have a greater risk of death.
    • Age, gender, and comorbidities drive the risk of COVId-19 death among patients withcancer.
    • Due to unfavorable prognostic factors, hospitalized patients withcancer and COVID-19had a high case-fatality rate.

Partnering for Meaningful COVID-19 Understanding

One of the most promising uses of these COVID-19-data-drive prediction models may be in prioritization of viral testing in localities with insufficient resources. The first priority would be the allocation of COVID-19 tests to frontline healthcare workers and individuals in contact with a large number of people, such as cashiers and bus drivers. For the remaining population, the thresholds of risk for COVID-19 (given symptoms) and risk of death from the virus could determine test allocation. Similarly, these data-powered models may support early allocation of vaccines when they becomes available, as immunization among high-risk individuals maximizes the early impact of a vaccine.

Combining the Touchstone COVID-19 Registry and Insights aggregated data from U.S. healthcare providers with the expertise and experience of Singapore’s MOHT provided capability and insights neither organization could muster alone. The opportunities for global collaborations such as this are endless and create a huge opportunity for the research community at large to leverage real-world evidence to address global health issues and ultimately improve health outcomes.

Additional Reading

Would you like to learn more about this topic? Here are some articles we suggest:

  1. Health Catalyst Launches COVID-19 Patient Data Repository to Speed Vaccine Development
  2. Using COVID-19 Value Sets for Patient Identification
  3. Health Catalyst Launches COVID-19 Patient Data Repository to Speed Vaccine Development
  4. A Sustainable Healthcare Emergency Management Framework: COVID-19 and Beyond
  5. Population Builder Makes Data Available to the Masses

PowerPoint Slides

你想使用或分享这些概念吗?下载突出重点的演示文稿。

Click Here to Download the Slides

Population Health Success: Three Ways to Leverage Data

This site uses cookies

We take pride in providing you with relevant, useful content. May we use cookies to track what you read? We take your privacy very seriously. Please see ourprivacy policyfor details and any questions.