Data extraction for evidence synthesis using a large language model: A proof‐of‐concept study

Gartlehner, Gerald; Kahwati, Leila; Hilscher, Rainer; Thomas, Ian; Kugley, Shannon; Crotty, Karen; Viswanathan, Meera; Nussbaumer‐Streit, Barbara; Booth, Graham; Erskine, Nathaniel; Konet, Amanda; Chew, Robert

Published in

Wiley, Research Synthesis Methods, 2024

DOI: 10.1002/jrsm.1710

Tools

Export citation

Search in Google Scholar

Data extraction for evidence synthesis using a large language model: A proof‐of‐concept study

Journal article published in 2024 by Gerald Gartlehner

, Leila Kahwati

, Rainer Hilscher, Ian Thomas, Shannon Kugley, Karen Crotty, Meera Viswanathan, Barbara Nussbaumer‐Streit, Graham Booth, Nathaniel Erskine

, Amanda Konet

, Robert Chew

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

AbstractData extraction is a crucial, yet labor‐intensive and error‐prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof‐of‐concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English‐language, open‐access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test–retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero‐shot learning). Based on findings of our proof‐of‐concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.

Published in

Links

Tools

Data extraction for evidence synthesis using a large language model: A proof‐of‐concept study

Abstract