An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Liu, Sijia; Wen, Andrew; Wang, Liwei; He, Huan; Fu, Sunyang; Miller, Robert; Williams, Andrew; Harris, Daniel; Kavuluru, Ramakanth; Liu, Mei; Abu-El-Rub, Noor; Schutte, Dalton; Zhang, Rui; Rouhizadeh, Masoud; Osborne, John D.; He, Yongqun; Topaloglu, Umit; Hong, Stephanie S.; Saltz, Joel H.; Schaffter, Thomas; Pfaff, Emily; Chute, Christopher G.; Duong, Tim; Haendel, Melissa A.; Fuentes, Rafael; Szolovits, Peter; Xu, Hua; Liu, Hongfang

Published in

Oxford University Press, JAMIA: A Scholarly Journal of Informatics in Health and Biomedicine, 12(30), p. 2036-2040, 2023

DOI: 10.1093/jamia/ocad134

Tools

Export citation

Search in Google Scholar

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Journal article published in 2023 by Sijia Liu

, Andrew Wen

, Liwei Wang

, Huan He

, Sunyang Fu

, Robert Miller, Andrew Williams

, Daniel Harris

, Ramakanth Kavuluru

, Mei Liu

, Noor Abu-El-Rub, Dalton Schutte, Rui Zhang

, Masoud Rouhizadeh

, John D. Osborne

and other authors.

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

Published in

Links

Tools

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Abstract