A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Sahoo, Himanshu S.; Silverman, Greg M.; Ingraham, Nicholas E.; Lupei, Monica I.; Puskarich, Michael A.; Finzel, Raymond L.; Sartori, John; Zhang, Rui; Knoll, Benjamin C.; Liu, Sijia; Liu, Hongfang; Melton, Genevieve B.; Tignanelli, Christopher J.; Pakhomov, Serguei V. S.

Published in

Oxford University Press, JAMIA Open, 3(4), 2021

DOI: 10.1093/jamiaopen/ooab070

Tools

Export citation

Search in Google Scholar

A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Journal article published in 2021 by Himanshu S. Sahoo

, Greg M. Silverman, Nicholas E. Ingraham

, Monica I. Lupei

, Michael A. Puskarich, Raymond L. Finzel, John Sartori, Rui Zhang

, Benjamin C. Knoll, Sijia Liu

, Hongfang Liu, Genevieve B. Melton

, Christopher J. Tignanelli

, Serguei V. S. Pakhomov

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Abstract Objective With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.

Published in

Links

Tools

A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Abstract