The Healthcare Analytics Summit is back! Join us live in Salt Lake City, Sept. 13-15.Register Now

Healthcare NLP: The Secret to Unstructured Data’s Full Potential

April 2, 2019

Article Summary


尽管医疗保健数据是一种不断增长的资源,但由于电子病历的广泛采用和新的来源(例如,患者生成的数据),许多卫生系统目前没有充分利用这一信息缓存。分析师无法提取和分析医疗保健数据的很大一部分(例如后续预约、生命体征、费用、订单、遭遇和症状),因为这些数据是非结构化或文本形式,比结构化数据更大、更复杂。

自然语言处理(NLP)利用人工智能(AI)从大约80%以文本形式存在的健康数据中提取和分析有意义的见解,从而挖掘非结构化数据的潜力。虽然NLP仍然是一种不断发展的能力,但它在帮助组织从数据中获得更多信息方面显示出了希望。

本报告基于2018年犹他大学医学院生物医学信息系系主任Wendy Chapman博士和健康催化剂技术总监Mike Dow的网络研讨会,题为“挖掘医疗保健领域自然语言处理的潜力”。世界杯葡萄牙vs加纳即时走地

As EHR adoption increases andhealth dataabounds, healthcare organizations have more analytics-driven opportunities to improve healthcare delivery and outcomes. Health systems, however, are having difficulty using all the available data to its fullest potential. Text (unstructured) data is a particular challenge, as it’s bigger, more complex, and has more sources and storage locations than structured data. But with the increasing use of natural language processing (NLP), organizations are growing their ability to get more actionable insights from healthcare data.

NLP利用人工智能(AI)来帮助分析系统理解和处理非结构化数据。该功能有望从电子病历中提取有用的数据,甚至在临床医生就诊时为电子病历提供语音识别功能。然而,电子病历目前让临床医生感到沮丧,因为它占用了患者参与和改善患者护理的时间。An American Medical Associationpollof physicians showed that around half of all respondents were dissatisfied with their EHR’s ability to improve costs, efficiency, and productivity.

本文描述了NLP在最大化EHR和医疗保健数据价值方面的潜力,使数据成为改善健康结果的关键和值得信任的组成部分。

The Promise of Healthcare NLP to Improve Outcomes

NLP处理来自不同来源(如emr、文献和社交媒体)的非结构化数据,以便分析系统能够解释它(图1)。一旦NLP将文本转换为结构化数据,卫生系统就可以使用它对患者进行分类,提取见解和总结信息。

Diagram of NLP classifying, extracting, and summarizing unstructured text
Figure 1: NLP classifies, extracts, and summarizes unstructured text

Four areas in which healthcare NLP can improve function—and, ultimately, care—include EHR usability, predictive analytics, phenotyping, and quality improvement:

NLP Improves EHR Data Usability

The typical EHR arranges information by patient encounter, making it difficult to find critical patient information (e.g., social history—a strong predictor of readmissions). NLP can enable an EHR interface that makes patient encounter information easier for clinicians to find.

By organizing the interface into sections, and including words associated with concerns patients described during encounters, the interface populates the rest of the page with information related to that word. For example, all mentions of fatigue would show on a timeline at the top of the page, and the notes about the word would show in a box at the bottom of the page. The interface makes it easier for clinicians to find buried data and make diagnoses they might have otherwise missed.

NLP Enables Predictive Analytics

One of the more exciting benefits of NLP is its ability to enable predictive analytics to improve significant population health concerns. For example, according to recent reports,suicide一直在崛起的美国医疗专业人员正在努力了解谁有患病风险,以便进行干预。A 2018study利用NLP通过监测社交媒体来预测自杀企图。结果显示,Twitter用户在试图自杀前发布的文字表情较少,特定类型的表情有限(例如,蓝色或心碎的符号),或发布更多愤怒或悲伤的推文(图2),这些用户有明显的自杀企图迹象。该系统有70%的预测率,只有10%的假阳性率。

Graph showing use of NLP to recognize suicide risk in emoji use
Figure 2: Using NLP to recognize suicide risk in emoji use

NLP Boosts Phenotyping Capabilities

Phenotype is an observable physical or biochemical expression of a specific trait in an organism. These traits may be related to appearance, biochemical processes, or behavior. Phenotyping helps clinicians group or categorize patients to provide a deeper, more focused look into data (e.g., listing patients who share certain traits) and the ability to compare patient cohorts. Currently, most analysts and clinicians use structured data for phenotyping because it’s easy to extract for analysis. NLP gives analysts a tool to extract and analyze unstructured data (e.g., follow-up appointments, vitals, charges, orders, encounters, and symptoms), which some experts estimate makes up 80 percent of all patient data available. Access to unstructured data makes a lot more information available to create phenotypes for patient groups.

NLP也允许更丰富的表型。例如,病理报告包含大量信息,例如病人的状况、生长的位置、癌症的阶段、程序、药物和遗传状态。传统的分析无法从病理报告中获取这些数据,而NLP使分析人员能够提取这种类型的数据来回答复杂的、特定的问题(例如,与某些基因突变相关的癌变组织类型)。

NLP Enables Health System Quality Improvement

The federal government and associated agencies require all hospitals to report certain outcome measures. One required measure is adenoma detection rate (ADR), which is the rate at which doctors find adenomas during a colonoscopy. The current process for reporting is to pay someone to analyze a small sampling of patient charts, read through the pathology reports, and calculate the ADR. NLP automates and accelerates this process, increasing the sample size of patient charts and allowing real-time analysis.

A clinician has developed areport cardthat uses NLP to automatically calculate ADR. Studies show that when physicians can see quantifiable results of their performance, they tend to change their behavior. In this case, physicians who receive feedback about their ADR changed their behaviors to improve detection rate. This is important because for every 1 percent increase in ADR, there is a 3 percent decrease in colon cancer mortality.

While the four areas in which NLP enhances the value of healthcare data show significant promise, NLP has a long way to go to widespread adoption and a large-scale impact on outcomes improvement.

Challenges and Limitations of NLP

Most health systems have not yet begun using NLP to its full potential. This is likely because implementing NLP successfully comes with significant challenges:

Garbage In, Garbage Out

The old saying “garbage in, garbage out” applies to NLP. Good, usable data can only be extracted if the data is easy to identify. When digging out data from EHRs, analysts often find a problem with the way data is entered: people commonly enter type information, which increases their tendency to use shortcuts and create templates. NLP looks for sentences, not templates, making it difficult to handle data within templates. Cut-and-pasted text presents another challenge; this shortcut leads to propagating more patient data than is relevant (note bloat) as well as outdated or inaccurate information throughout health records, making clinician notes less useful.

Modeling for Meaning Can Be Challenging

NLP运行于文本——一连串串在一起的单词。自然语言处理系统需要对文本进行语义提取和上下文推断,这是一项不容易完成的任务。如果开发人员不能很好地建模NLP系统以从一开始就找到意义,系统就不能很好地扩展。

NLP Works on Specific Sublanguages

Sublanguage, a subset of natural language, is another challenge for NLP. Medical language is a sublanguage with a subset of vocabulary and different vocabulary rules from the main language. To extract meaning from sublanguage, NLP systems must understand the rules of that language. Social media, for example, is a sublanguage. It uses abbreviations and emoticons to express meaning (versus using words for the same concepts). With these differences, analysts cannot run an NLP system trained on newspaper text on social media and expect it to extract the meaning.

医学语言有不同的子语言。例如,医学博客和临床记录使用不同的语言。由于这些差异,卫生系统不应该购买为一种子语言构建的现成NLP系统,然后在另一种子语言上使用它。开发人员和分析人员必须为特定语言(例如,医疗保健)定制NLP系统。,而裁剪的过程需要时间。

NLP Doesn’t Yet Distinguish Linguistic Variation

With linguistic variation, there are many ways to say the same thing (e.g., derivation, in which different forms of words have similar meaning, and synonymy, in which one concept has different words). NLP doesn’t yet distinguish linguistic variation.

How Healthcare Organizations Can Use NLP Now

尽管NLP面临着需要解决的挑战,但随着能力的发展,卫生系统仍然可以受益,从更可实现的目标(容易实现的目标)开始,并转向更复杂的应用(容易实现的目标)。

The Low-Hanging Fruit: Easy for NLP to Identify and Process

There are certain areas where current NLP is already effective:

  • Explicit mentions: when NLP is looking for chest pain, it will process the phrase “chest pain.”
  • Unambiguous vocabulary: the words have one meaning, no matter where they appear.

在一些医疗保健领域,包括预测分析和质量改进,成功的NLP应用在低挂的水果类别是一个现实。For example, astudyassessed using NLP to process radiology reports to look for pulmonary embolism

(PE) and postoperative venous thromboembolism (VTE). It found that NLP and unstructured data captured 50 percent more cases than structured data alone would identify.

目前,NLP在决策支持方面也发挥着重要作用。例如,NLP可以从电子病历中标记出有一级或二级亲属在45岁之前被诊断出乳腺癌或结直肠癌病史的患者。一旦NLP系统标记了这些患者,患者门户就会发送一封电子邮件,提醒被标记的患者他们的家族史和这些癌症的风险增加,并建议预防措施。

Evolving Towards the High-Hanging Fruit

As NLP evolves and developers meet the current challenges of NLP, health system analysts will more easily access the high-hanging fruit. Getting to the high-hanging fruit requires more advanced capabilities:

  • Inference: If a clinician wants to know if a patient has social support but the phrase “has social support” isn’t in the EHR, next-generation NLP will be able to infer meaning from the context. Instead of relying on the exact phrase of “has social support,” the system will process a phrase like “brother at bedside” and know it means the patient has social support.
  • Ambiguous vocabulary: If an analyst programs the NLP system to look for the phrase “brother at bedside” and it sees the phrase “stood at bedside” or “brother died of heart attack,” it will know that the meaning is not the same despite at least one of the words being in the same. With current NLP systems, these two phrases would bring back false positives for meaning the same as “brother at bedside.”
  • Semantic roles: Current NLP system struggles with semantic roles. The phrase “wife helps patient with meds” is very different from “patient helps wife with meds.” However, a keyword-based NLP system cannot differentiate between the two phrases, which is a current limitation of NLP. In the future, NLP systems could be programmed to understand semantic roles (e.g., who’s the subject and who’s the object).

NLP’s Promise to Get More from Healthcare Data

尽管NLP面临挑战,但医疗保健行业开始接受其从各种各样的健康数据中获得关键见解的潜力。医疗保健组织已经在使用NLP来获得唾手可得的成果,主要的技术实体正在将NLP用于与健康相关的工具;Amazon, for example, recently released a user-friendlyclinical NLP工具。

许多开源工具都是免费的——允许用户进行分类、查找短语和查找提供家族历史线索的上下文信息。但是,为了最大化NLP在医疗保健领域的潜力,组织需要超越这些现成的解决方案,着眼于集成到现有工作流中的特定于医疗保健的供应商系统。2022卡塔尔世界杯赛程表时间这一战略方法将充分利用NLP来改善医疗保健结果。

Additional Reading

Would you like to learn more about this topic? Here are some articles we suggest:

  1. Healthcare NLP: Four Essentials to Make the Most of Unstructured Data
  2. How Healthcare Text Analytics and Machine Learning Work Together to Improve Patient Outcomes
Four Steps to Effective Opportunity Analysis

This site uses cookies

我们很自豪能为您提供相关的、有用的内容。我们可以用cookie记录你读了什么吗?我们非常重视您的隐私。Please see ourprivacy policy详情和任何问题。