The Healthcare Analytics Summit is back! Join us live in Salt Lake City, Sept. 13-15.Register Now

How a U.S. COVID-19 Data Registry Fuels Global Research

October 21, 2020
Farhana Nakhooda

SVP Health Catalyst, Asia Pacific (APAC)

Larry Lofgreen

Asia Pacific Sales and Solutions Consulting, VP

Pol Margalef, PhD

Strategy and Business Development Consultant, Life Sciences

Praveen Deorani

Senior Data Scientist, Singapore Ministry of Healthcare

Sadiqa Mahmood, DDS, MPH

General Manager & Senior Vice President, Life Sciences Business

In addition to driving COVID-19 understanding within the United States, a national disease registry is informing research beyond U.S. borders. Clinicians with the Singapore Ministry of Healthcare Office for Healthcare Transformation (MOHT) have used Health Catalyst Touchstone®利用COVID-19数据开发机器学习工具,帮助预测COVID-19死亡的可能性。有了这个利用深度聚合的电子病历数据的国家数据集,卫生部访问了它所需的研究级数据,以构建一个预测COVID-19死亡风险的机器学习算法。The registry-informed prediction model was accurate enough to stand up to comparisons in the published literature and promises to help inform vaccine research and, ultimately, allocation of vaccines within populations.

The COVID-19 outbreak has been a significant U.S. and global concern, given the speed of spread and breadth of health impacts (both known and unknown) on thepopulationlevel. The virus causes fever, cough, lack of smell, fatigue, and mild to severe respiratory complications, which, if very severe, can lead to patient death. Meanwhile, incomplete, non-transparent, and out-of-date COVID-19data是国内外了解和管理该病毒以及开发疫苗的主要障碍之一。为了避免缺乏真实世界的研究级证据,研究人员正在寻找全面、实时的COVID-19数据的创新来源。

A national COVID-19 data set that leverages deep aggregated EMR data delivers the depth and breadth of understanding researchers need to manage the virus and develop a vaccine. The Health CatalystTouchstone® COVID-19 Registry and Insights, for example, includes de-identified data from 80 million patients across the United States and tracking data from three national sources—Johns Hopkins University, theNew York Times, andThe COVID Tracking Project. With such broad data access, data analysts can leverage data on a national scale to drive population-level insights aboutsurveillance, testing,capacity planning, and treatment response.

National-Level COVID-19 Data Powers Global Research

试金石和全国COVID-19登记处也承诺为美国境外的研究提供信息。In the summer of 2020, the SingaporeMinistry of Healthcare’s(MOH)Office for Healthcare Transformation(MOHT), in collaboration with Health Catalyst, used Touchstone COVID-19 data to develop amachine learningtool that helps predict the likelihood of COVID-19 mortality—a critical insight for driving care to highest-risk patients and managing the outbreak on a population level. To validate the accuracy of their predictive tool, Health Catalyst compared its results with results published in the literature and determined its registry-informed research aligned closely to peer-reviewed publications.

“For a rapidly evolving situation like COVID-19, medical researchers can’t rely solely on clinical trials for guidance,” explains Praveen Deorani, Senior Data Scientist, for the Singapore MOHT. “As a practical alternative to informing medical decisions, a machine learning model can generate and analyze real-world evidence much faster.”

Registry-Driven Analytics Tools Leverage COVID-19 Data for Decision Support

In an effort to assist neighboring countries that may not have the research resources available, the Singapore MOHT sought to provideanalytictools to assist in managing the pandemic. However, Singapore’s population size and the strict control measures implemented in Singapore combined to limit both the nation’s number of COVID-19 cases and the COVID-19 mortality rate, leaving a dearth ofdatato powerpredictivetools.

Data scientists with the Singapore MOHT evaluated detailed COVID-19 data from the Touchstone registry to identify patient factors linked to COVID-19 mortality (Figure 1).

factors linked covid-19 mortality

Figure 1: Factors linked to COVID-19 mortality.

Touchstone COVID-19数据集包含168,632名不同患者的已确定数据。为了进行比较,该数据集包括了具有covid -19相关症状和诊断的患者。在这些独特的患者中,47464人至少表现出一种COVID-19相关症状,其中约21%的人COVID-19检测呈阳性。同样,数据包含26415名COVID-19检测呈阳性的患者(61%要么无症状,要么治疗机构没有记录症状)。covid -19阳性患者的covid -19相关死亡率约为3%(26415名患者中789名)。

The initial analysis effort focused on providing a triage tool for prioritizing care of patients exhibiting COVID-19-related symptoms. As Figure 2 shows, patients who tested positive for COVID-19 had different symptom distributions versus those who did not test positive. However, most patients were either asymptomatic or had no symptoms recorded. The small number of patients exhibiting loss of taste/smell is of particular interest to the MOHT, as this symptom has been seen as a strong indicator of COVID-19 in Singapore.

covid-19 data

Figure 2: Symptom distribution for patients with COVID-19.

Despite the general lack of symptom data, when the MOHT researchers compared the correlation of symptoms to a positive COVID-19 test, two symptoms stood out: prior viral exposure and loss of taste/smell (the latter confirming what Singapore had determined through their testing regimes). Ultimately, the U.S. symptom data was too sparse to form the basis of a predictive model that could perform better than the literature-based, deterministic test result model that MOHT had already developed (Figure 3).

MOHT covid-19 test result prediction model

图3:MOHT COVID-19检测结果预测模型。

A Data-Informed Machine Learning Tool Helps Predict Who Is Likely to Die of COVID-19

After the MOHT initial analysis efforts, the organization used factors such as age, race, gender, and comorbidities (including hypertension, cancer, and more), to produce a machine learning prediction tool to help clinicians identify COVID-19 patients at the highest risk of death (Figure 4). Some of the MOHT’s most meaningful insights include the following:

  • The mortality rate varies significantly by age group and gender and somewhat with race.
  • Black or African Americans have higher mortality rates despite a slightly lower age.
  • 死亡率取决于与年龄分布无关的共病。
covid-19 data

Figure 4: The MOHT COVID-19 mortality prediction tool 1/2.

与缺乏症状数据相比,患者人口统计和共病数据支持死亡率预测模型(在所有可能的分类阈值上表现的综合衡量指标,AUC为86.7%)。对于上面图表中的合并症,红色表示存在,蓝色表示不存在。如数值所示,大多数共病对死亡风险有明显的影响。

However, comorbidity-based prediction is only useful if the analysts know a patient’s comorbidities. Therefore, given the observed impact of age, gender, and race in the comorbidity-based model, the MOHT data scientists created a second model using only those features likely universally available to clinicians: age, gender, race, and history of tobacco use. As Figure 5 shows, this model was performed nearly the same as the model with comorbidities (an AUC 85 percent versus the original AUC of 86.7 percent).

covid-19 data

Figure 5: MOHT COVID-19 mortality prediction tool 2/2.

The COVID-19 Mortality Prediction Model Stands up to Peer-Reviewed Literature

To verify the accuracy of the COVID-19 mortality prediction model, the MOHT reviewed published literature to compare the model’s outcomes with other research. The team determined its prediction model results were overwhelmingly consistent with other peer-reviewed studies.

The following lists offer examples of factors the MOHT model uses to predict COVID-19 mortality and some of the published literature that confirms their relationship to COVID-19 mortality:

  • Patient age—Several studies indicate patient age is a reliable predictor for COVID-19 mortality:
  • Race—Race has some mixed results as a predictor of COVID-19 death, but some studies show a correlation:
    • New York study determines an association between race with COVID-19 mortality.
    • The U.S.Black populationhas a higher rate of COVID-19 case fatality.
  • Gender—Studies associate gender with COVID-19 mortality:
    • In China,menare more at risk for having the worst COVID-19 outcomes and death.
    • A multivariate regression identifies beingmaleas a risk factor for COVID-19 mortality.
  • Cancer—Patients with COVID-19 and cancer have a greater risk of death.
    • Age, gender, and comorbidities drive the risk of COVId-19 death among patients withcancer.
    • Due to unfavorable prognostic factors, hospitalized patients withcancer and COVID-19had a high case-fatality rate.

Partnering for Meaningful COVID-19 Understanding

One of the most promising uses of these COVID-19-data-drive prediction models may be in prioritization of viral testing in localities with insufficient resources. The first priority would be the allocation of COVID-19 tests to frontline healthcare workers and individuals in contact with a large number of people, such as cashiers and bus drivers. For the remaining population, the thresholds of risk for COVID-19 (given symptoms) and risk of death from the virus could determine test allocation. Similarly, these data-powered models may support early allocation of vaccines when they becomes available, as immunization among high-risk individuals maximizes the early impact of a vaccine.

Combining the Touchstone COVID-19 Registry and Insights aggregated data from U.S. healthcare providers with the expertise and experience of Singapore’s MOHT provided capability and insights neither organization could muster alone. The opportunities for global collaborations such as this are endless and create a huge opportunity for the research community at large to leverage real-world evidence to address global health issues and ultimately improve health outcomes.

Additional Reading

你想了解更多关于这个话题吗?Here are some articles we suggest:

  1. Health Catalyst Launches COVID-19 Patient Data Repository to Speed Vaccine Development
  2. Using COVID-19 Value Sets for Patient Identification
  3. Health Catalyst Launches COVID-19 Patient Data Repository to Speed Vaccine Development
  4. A Sustainable Healthcare Emergency Management Framework: COVID-19 and Beyond
  5. Population Builder Makes Data Available to the Masses
Build Versus Buy a Healthcare Enterprise Data Warehouse: How IT Leaders Choose the Best Option for Their Organizations

This site uses cookies

We take pride in providing you with relevant, useful content. May we use cookies to track what you read? We take your privacy very seriously. Please see ourprivacy policyfor details and any questions.